Import Needed libraries¶
# Import Needed libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from patsy import dmatrices
#import decisiontreeclassifier
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
#import randomforest classifier
from sklearn.ensemble import RandomForestClassifier
#import logisticregression classifier
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sm
#import knn classifier
from sklearn.neighbors import KNeighborsClassifier
#for validating your classification model
from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import cross_val_score
from sklearn import metrics
from sklearn.metrics import roc_curve, auc
# feature selection
from sklearn.feature_selection import RFE
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.grid_search import GridSearchCV
from IPython.display import Image
from IPython.core.display import HTML
pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
height has been deprecated.
# input title slide here
Image('Images/christensen_finalprj/Slide1.png')
# input outline slide here
Image('Images/christensen_finalprj/Slide2.png')
# input problem description & introduction slide here
Image('Images/christensen_finalprj/Slide3.png')
Determine Business Objectives:¶
- Background: Hospitals are penalized for patients that are re-admitted less than 30 days after they are released.
- Business Objectives: To reduce or eliminate the number of patients re-admitted less than 30 days after they are released.
- Success Criteria: Identification of factors that increse the likelihood of a patient returning within 30 days.
- Business Value: The average cost in 2011 for a hospital stay was $10,000.*
- *http://www.beckershospitalreview.com/finance/11-statistics-on-average-hospital-costs-per-stay.html
# embed full problem description web image
Image(url= "https://72022e22-a-62cb3a1a-s-sites.googlegroups.com/site/christensenfinalprj/full-problem-description/christensen_finalprj-full-prob-descrp.png?attachauth=ANoY7coojyQ211rGpA2ygjc39hleeQ1Z-4R5R-B88BLBFtH1jAq2nT7l4JsRJfQhXwLYsBJ-HR_9ZFkGPGhybCqgeeBS4QVGujnYQae0dHKQCi6coswid7_6nzzAJ-riEv5DfvoEe1sls3dzBnvabtMJMesqIkRfqQSISQBI-Bdpp1ZveQ__SDPGMfWoW4kK1rmIropOOyy_2QQXBqRlq1hCp7cN6UPYzTlp54LBVweAnkfNexPQsIZDh90sH3xMlJGsDWOoDZHAHlB16u69fTPBSXvO57EqRNRejBKcU2bYYujzMRFjMp0%3D&attredirects=0")
# input key findings & insights slide here
Image('Images/christensen_finalprj/Slide4.png')
# input key final analysis & recommendation slide here
Image('Images/christensen_finalprj/Slide5.png')
# input next steps slide here
Image('Images/christensen_finalprj/Slide6.png')
Next Steps Ideas¶
- Analyse those close to the 30 day threshold - i.e. 31 to 45-60 days
- Weight Data
- Cross referencing between the 3 Diagnosis'
- Analyzing the Order of the 3 Diagnosis'
- Add more Diagnosis
- More Granular in the Diagnosis
- ?
# input dataset slide here
Image('Images/christensen_finalprj/Slide7.png')
Determine Business Objectives:¶
- Description: The dataset contains over 56,000 HIPPA compliant de-identified records of hospital admissions.
- Source: Hack K-State 2016 : Data Science For Social Good - https://zslie.github.io/
- Details: There are 50 columns, of which is the Visit ID and Patient ID, along with 48 factors.
- Factors: The factors have varying number of attributes, ranging from 1 to 715, so there are ~5.27x10^41 solutions.
- Factors: Descriptions below.
#embed factor descriptions 'fd'
fd = pd.read_excel('data/factor-definitions.xlsx')
fd
| Column Name | Column Value | Type | Description and values | |
|---|---|---|---|---|
| 0 | encounter_id | Encounter ID | Numeric | Unique identifier of an encounter |
| 1 | patient_nbr | Patient number | Numeric | Unique identifier of a patient |
| 2 | race | Race | Nominal | Values: Caucasian, Asian, African American, Hi... |
| 3 | gender | Gender | Nominal | Values: male, female, and unknown/invalid |
| 4 | age | Age | Nominal | Grouped in 10-year intervals: [0, 10), [10, 20... |
| 5 | weight | Weight | Numeric | Weight in pounds. |
| 6 | admission_type_id | Admission Type | Nominal | Integer identifier corresponding to 9 distinct... |
| 7 | discharge_disposition_id | Discharge Disposition | Nominal | Integer identifier corresponding to 29 distinc... |
| 8 | admission_source_id | Admission Source | Nominal | Integer identifier corresponding to 21 distinc... |
| 9 | time_in_hospital | Time in Hospital | Numeric | Integer number of days between admission and d... |
| 10 | payer_code | Payer Code | Nominal | Integer identifier corresponding to 23 distinc... |
| 11 | medical_specialty | Medical Specialty | Nominal | Integer identifier of a specialty of the admit... |
| 12 | num_lab_procedures | Number of lab procedures | Numeric | Number of lab tests performed during the encou... |
| 13 | num_procedures | Number of procedures | Numeric | Number of procedures (other than lab tests) pe... |
| 14 | num_medications | Number of medications | Numeric | Number of distinct generic names administered ... |
| 15 | number_outpatient | Number of outpatient visits | Numeric | Number of outpatient visits of the patient in ... |
| 16 | number_emergency | Number of emergency visits | Numeric | Number of emergency visits of the patient in t... |
| 17 | number_inpatient | Number of inpatient visits | Numeric | Number of inpatient visits of the patient in t... |
| 18 | diag_1 | Diagnosis 1 | Nominal | The primary diagnosis (coded as first three di... |
| 19 | diag_2 | Diagnosis 2 | Nominal | Secondary diagnosis (coded as first three digi... |
| 20 | diag_3 | Diagnosis 3 | Nominal | Additional secondary diagnosis (coded as first... |
| 21 | number_diagnoses | Number of diagnoses | Numeric | Number of diagnoses entered to the system |
| 22 | max_glu_serum | Glucose serum test result | Nominal | Indicates the range of the result or if the te... |
| 23 | A1Cresult | A1c test result | Nominal | Indicates the range of the result or if the te... |
| 24 | 24 features for medications | 24 features for medications | Nominal | For the generic names: metformin, repaglinide,... |
| 25 | change | Change of medications | Nominal | Indicates if there was a change in diabetic me... |
| 26 | diabetesMed | Diabetes medications | Nominal | Indicates if there was any diabetic medication... |
| 27 | readmitted | Readmitted | Nominal | Days to inpatient readmission. Values: “<30” i... |
Performed some data manipulation directly in excel, including:¶
- Changed
'medical_specialy' to 'MED_SPEC_NUM' - Changed the 3 <string/int> 'diag_x's to
'DIAG_CAT_X'S & converted 858 unique diagnosis' into 33 Diagnosis Categories - Notes are in Challenge_1_Training_Data_Conversion.xlsx file on the "Storage" page
#import patient data
df = pd.read_csv('data/Challenge_1_Training_Work_Clean.csv')
df.head(5)
| encounter_id | patient_nbr | race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | medical_specialty | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | diag_1 | DIAG_CAT_1 | diag_2 | DIAG_CAT_2 | diag_3 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 22915332 | 1475073 | Caucasian | Female | [80-90) | ? | 3 | 1 | 4 | 5 | ? | ? | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 414 | 10 | 289 | 4 | 593 | 18 | 7 | None | None | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes | >30 |
| 1 | 158361324 | 93771396 | Caucasian | Female | [70-80) | ? | 5 | 3 | 1 | 6 | MC | ? | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 518 | 16 | 428 | 13 | 496 | 16 | 9 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes | NO |
| 2 | 120453192 | 24581277 | Other | Female | [60-70) | ? | 1 | 22 | 7 | 4 | SP | InternalMedicine | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 820 | 24 | 599 | 18 | 191 | 2 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | Yes | NO |
| 3 | 25590894 | 5041395 | Caucasian | Male | [70-80) | ? | 1 | 1 | 7 | 3 | ? | InternalMedicine | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 537 | 17 | 280 | 4 | 250.41 | 3 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | Yes | >30 |
| 4 | 154290822 | 49027563 | Caucasian | Female | [30-40) | ? | 2 | 1 | 1 | 3 | ? | ? | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 790 | 23 | 599 | 18 | V42 | 32 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | NO |
# Number of unique values in each column
df.apply(pd.Series.nunique)
encounter_id 56000 patient_nbr 44369 race 6 gender 2 age 10 weight 10 admission_type_id 8 discharge_disposition_id 26 admission_source_id 17 time_in_hospital 14 payer_code 17 medical_specialty 64 MED_SPEC_NUM 64 num_lab_procedures 114 num_procedures 7 num_medications 73 number_outpatient 32 number_emergency 25 number_inpatient 21 diag_1 661 DIAG_CAT_1 31 diag_2 668 DIAG_CAT_2 30 diag_3 715 DIAG_CAT_3 30 number_diagnoses 16 max_glu_serum 4 A1Cresult 4 metformin 4 repaglinide 4 nateglinide 4 chlorpropamide 4 glimepiride 4 acetohexamide 2 glipizide 4 glyburide 4 tolbutamide 2 pioglitazone 4 rosiglitazone 4 acarbose 4 miglitol 4 troglitazone 2 tolazamide 2 examide 1 citoglipton 1 insulin 4 glyburide.metformin 4 glipizide.metformin 2 glimepiride.pioglitazone 1 metformin.rosiglitazone 2 metformin.pioglitazone 2 change 2 diabetesMed 2 readmitted 3 dtype: int64
#show the information about the data'
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 56000 entries, 0 to 55999 Data columns (total 54 columns): encounter_id 56000 non-null int64 patient_nbr 56000 non-null int64 race 56000 non-null object gender 56000 non-null object age 56000 non-null object weight 56000 non-null object admission_type_id 56000 non-null int64 discharge_disposition_id 56000 non-null int64 admission_source_id 56000 non-null int64 time_in_hospital 56000 non-null int64 payer_code 56000 non-null object medical_specialty 56000 non-null object MED_SPEC_NUM 56000 non-null int64 num_lab_procedures 56000 non-null int64 num_procedures 56000 non-null int64 num_medications 56000 non-null int64 number_outpatient 56000 non-null int64 number_emergency 56000 non-null int64 number_inpatient 56000 non-null int64 diag_1 56000 non-null object DIAG_CAT_1 56000 non-null int64 diag_2 56000 non-null object DIAG_CAT_2 56000 non-null int64 diag_3 56000 non-null object DIAG_CAT_3 56000 non-null int64 number_diagnoses 56000 non-null int64 max_glu_serum 56000 non-null object A1Cresult 56000 non-null object metformin 56000 non-null object repaglinide 56000 non-null object nateglinide 56000 non-null object chlorpropamide 56000 non-null object glimepiride 56000 non-null object acetohexamide 56000 non-null object glipizide 56000 non-null object glyburide 56000 non-null object tolbutamide 56000 non-null object pioglitazone 56000 non-null object rosiglitazone 56000 non-null object acarbose 56000 non-null object miglitol 56000 non-null object troglitazone 56000 non-null object tolazamide 56000 non-null object examide 56000 non-null object citoglipton 56000 non-null object insulin 56000 non-null object glyburide.metformin 56000 non-null object glipizide.metformin 56000 non-null object glimepiride.pioglitazone 56000 non-null object metformin.rosiglitazone 56000 non-null object metformin.pioglitazone 56000 non-null object change 56000 non-null object diabetesMed 56000 non-null object readmitted 56000 non-null object dtypes: int64(17), object(37) memory usage: 23.1+ MB
#describe the column readmitted only (e.g., count, unique, frequency)
df['readmitted'].describe()
count 56000 unique 3 top NO freq 30238 Name: readmitted, dtype: object
#distribution of 0 and 1 in the readmitted column
df.groupby('readmitted').count()
| encounter_id | patient_nbr | race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | medical_specialty | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | diag_1 | DIAG_CAT_1 | diag_2 | DIAG_CAT_2 | diag_3 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| readmitted | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| <30 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 |
| >30 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 |
| NO | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 |
#replace the values of the 'readmitted' column:
# NO = 0
# >30 = 1
# <30 = 2
df = df.replace({'readmitted': {'NO': 0, '>30': 1, '<30': 2}})
df.head(2)
| encounter_id | patient_nbr | race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | medical_specialty | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | diag_1 | DIAG_CAT_1 | diag_2 | DIAG_CAT_2 | diag_3 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 22915332 | 1475073 | Caucasian | Female | [80-90) | ? | 3 | 1 | 4 | 5 | ? | ? | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 414 | 10 | 289 | 4 | 593 | 18 | 7 | None | None | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes | 1 |
| 1 | 158361324 | 93771396 | Caucasian | Female | [70-80) | ? | 5 | 3 | 1 | 6 | MC | ? | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 518 | 16 | 428 | 13 | 496 | 16 | 9 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes | 0 |
Key Business Data Question Summary¶
- Of 56,000 hospital visits in this dB:
- 6,285 were re-admitted < 30 days - these are the instances that need solved for
- 19,477 were also re-admitted, but after the 30 day threshold
- 30,238 were not re-admitted - there could be some insight also gleaned from why they DID'T have to be re-admitted
# input ETL slide here
Image('Images/christensen_finalprj/Slide8.png')
Data understanding & processing (ETL)¶
#drop or remove the columns 'encounter_id', 'patient_nbr' since this column is not used in the analysis and disply the result
df = df.drop('encounter_id', axis=1)
df = df.drop('patient_nbr', axis=1)
df = df.drop('medical_specialty', axis=1)
# drop or remove the columns 'diag_1', 'diag_2' and 'diag_3' since these values of been put into catergories
# in columns 'DIAG_CAT_1', 'DIAG_CAT_2' and 'DIAG_CAT_3'
df = df.drop('diag_1', axis=1)
df = df.drop('diag_2', axis=1)
df = df.drop('diag_3', axis=1)
df.head(5)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Caucasian | Female | [80-90) | ? | 3 | 1 | 4 | 5 | ? | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | None | None | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes | 1 |
| 1 | Caucasian | Female | [70-80) | ? | 5 | 3 | 1 | 6 | MC | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes | 0 |
| 2 | Other | Female | [60-70) | ? | 1 | 22 | 7 | 4 | SP | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 24 | 18 | 2 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | Yes | 0 |
| 3 | Caucasian | Male | [70-80) | ? | 1 | 1 | 7 | 3 | ? | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 17 | 4 | 3 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | Yes | 1 |
| 4 | Caucasian | Female | [30-40) | ? | 2 | 1 | 1 | 3 | ? | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 23 | 18 | 32 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | 0 |
#distribution of races in the race column
df.groupby('race').count()
| gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| race | |||||||||||||||||||||||||||||||||||||||||||||||
| ? | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 | 1215 |
| AfricanAmerican | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 | 10563 |
| Asian | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 | 356 |
| Caucasian | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 | 41886 |
| Hispanic | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 | 1117 |
| Other | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 | 863 |
#replace the values of the 'race' column:
# ? = 0
# AfricanAmerican = 1
# Asian = 2
# Caucasion = 3
# Hispanic = 4
# Other = 5
df = df.replace({'race': {'?': 0, 'AfricanAmerican': 1, 'Asian': 2,'Caucasian': 3,'Hispanic': 4,'Other': 5}})
df.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | Female | [80-90) | ? | 3 | 1 | 4 | 5 | ? | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | None | None | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes | 1 |
| 1 | 3 | Female | [70-80) | ? | 5 | 3 | 1 | 6 | MC | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes | 0 |
#distribution of genders in the gender column
df.groupby('gender').count()
| race | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gender | |||||||||||||||||||||||||||||||||||||||||||||||
| Female | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 | 29990 |
| Male | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 | 26010 |
#replace the values of the 'gender' column:
# Female = 0
# Male = 1
df = df.replace({'gender': {'Male': 1, 'Female': 0}})
df.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | [80-90) | ? | 3 | 1 | 4 | 5 | ? | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | None | None | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes | 1 |
| 1 | 3 | 0 | [70-80) | ? | 5 | 3 | 1 | 6 | MC | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes | 0 |
#distribution of decade age categories in the age column
df.groupby('age').count()
| race | gender | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| age | |||||||||||||||||||||||||||||||||||||||||||||||
| [0-10) | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 |
| [10-20) | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 | 355 |
| [20-30) | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 | 934 |
| [30-40) | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 | 2070 |
| [40-50) | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 | 5237 |
| [50-60) | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 | 9578 |
| [60-70) | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 | 12422 |
| [70-80) | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 | 14356 |
| [80-90) | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 | 9436 |
| [90-100) | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 | 1514 |
#replace the values of the 'age' column:
# [0-10) = 0
# [10-20) = 1
# [20-30) = 2
# [30-40) = 3
# [40-50) = 4
# [50-60) = 5
# [60-70) = 6
# [70-80) = 7
# [80-90) = 8
# [90-100) = 9
df = df.replace({'age': {'[0-10)': 0, '[10-20)': 1, '[20-30)': 2, '[30-40)': 3, '[40-50)': 4, '[50-60)': 5, '[60-70)': 6, '[70-80)': 7, '[80-90)': 8, '[90-100)': 9}})
df.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | ? | 3 | 1 | 4 | 5 | ? | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | None | None | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes | 1 |
| 1 | 3 | 0 | 7 | ? | 5 | 3 | 1 | 6 | MC | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes | 0 |
#distribution of weight categories in the weight column
df.groupby('weight').count()
| race | gender | age | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| weight | |||||||||||||||||||||||||||||||||||||||||||||||
| >200 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| ? | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 | 54238 |
| [0-25) | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 |
| [100-125) | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 | 349 |
| [125-150) | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 | 67 |
| [150-175) | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 |
| [175-200) | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| [25-50) | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 | 58 |
| [50-75) | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 | 488 |
| [75-100) | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 | 745 |
#replace the values of the 'weight' column:
# ? = 0
# [0-25) = 1
# [25-50) = 2
# [50-75) = 3
# [75-100) = 4
# [100-125) = 5
# [125-150) = 6
# [150-175) = 7
# [175-200) = 8
# > 200 = 9
df = df.replace({'weight': {'?': 0, '[0-25)': 1, '[25-50)': 2, '[50-75)': 3, '[75-100)': 4, '[100-125)': 5, '[125-150)': 6, '[150-175)': 7, '[175-200)': 8, '>200': 9}})
df.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | ? | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | None | None | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | MC | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes | 0 |
#distribution of pay types in the payer_code column
df.groupby('payer_code').count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| payer_code | |||||||||||||||||||||||||||||||||||||||||||||||
| ? | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 | 22153 |
| BC | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 | 2550 |
| CH | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 | 81 |
| CM | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 |
| CP | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 | 1405 |
| DM | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 | 309 |
| HM | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 | 3506 |
| MC | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 | 17855 |
| MD | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 | 1977 |
| MP | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 | 46 |
| OG | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 | 564 |
| OT | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 |
| PO | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 | 312 |
| SI | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 |
| SP | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 | 2759 |
| UN | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 | 1293 |
| WC | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 |
#replace the values of the 'payer_code' column:
df = df.replace({'payer_code': {'?': 0, 'BC': 1, 'CH': 2, 'CM': 3, 'CP': 4, 'DM': 5, 'HM': 6, 'MC': 7, 'MD': 8, 'MP': 9, 'OG': 10, 'OT': 11, 'PO': 12, 'SI': 13, 'SP': 14, 'UN': 15, 'WC': 16}})
df.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | None | None | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes | 0 |
#distribution of categories in the max_glu_serum column
df.groupby('max_glu_serum').count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| max_glu_serum | |||||||||||||||||||||||||||||||||||||||||||||||
| >200 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 | 800 |
| >300 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 | 692 |
| None | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 | 53027 |
| Norm | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 | 1481 |
#replace the values of the 'max_glu_serum' column:
# None = 0
# Norm = 1
# >200 = 2
# >300 = 3
df = df.replace({'max_glu_serum': {'None': 0, 'Norm': 1, '>200': 2, '>300': 3}})
df.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | None | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | None | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes | 0 |
#distribution of categories in the A1Cresult column
df.groupby('A1Cresult').count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A1Cresult | |||||||||||||||||||||||||||||||||||||||||||||||
| >7 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 | 2143 |
| >8 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 | 4523 |
| None | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 | 46560 |
| Norm | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 | 2774 |
#replace the values of the 'A1Cresult' column:
# None = 0
# Norm = 1
# >7 = 2
# >8 = 3
df = df.replace({'A1Cresult': {'None': 0, 'Norm': 1, '>7': 2, '>8': 3}})
df.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes | 0 |
#distribution of Ch or No in the change column
df.groupby('change').count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| change | |||||||||||||||||||||||||||||||||||||||||||||||
| Ch | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 | 25910 |
| No | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 | 30090 |
#replace the values of the 'change' column:
# No = 0
# Ch = 1
df = df.replace({'change': {'No': 0, 'Ch': 1}})
df.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | 0 | Yes | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | 1 | Yes | 0 |
#distribution of No or Yes in the diabetesMed column
df.groupby('diabetesMed').count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| diabetesMed | |||||||||||||||||||||||||||||||||||||||||||||||
| No | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 | 12890 |
| Yes | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 | 43110 |
#replace the values of the 'diabetesMed' column:
# No = 0
# Yes = 1
df = df.replace({'diabetesMed': {'No': 0, 'Yes': 1,}})
df.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | 0 | 1 | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | Steady | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | Up | No | No | No | No | No | 1 | 1 | 0 |
#distribution of the medical specialty categories in the MED_SPEC_NUM column
df.groupby('MED_SPEC_NUM').count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MED_SPEC_NUM | |||||||||||||||||||||||||||||||||||||||||||||||
| 0 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 | 27562 |
| 1 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 2 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| 3 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 |
| 4 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 | 2912 |
| 5 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 6 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 7 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 8 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 | 4189 |
| 9 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 | 66 |
| 10 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| 11 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 | 4032 |
| 12 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 | 318 |
| 13 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 37 |
| 14 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 |
| 15 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 | 118 |
| 16 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 |
| 17 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 |
| 18 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 | 8055 |
| 19 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 | 914 |
| 20 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 | 122 |
| 21 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 22 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 |
| 23 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
| 24 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 |
| 25 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 | 198 |
| 26 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 |
| 27 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 | 764 |
| 28 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 | 651 |
| 29 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 |
| 30 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 |
| 31 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| 32 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 |
| 33 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 |
| 34 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 35 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 |
| 36 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 37 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 | 92 |
| 38 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 39 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 |
| 40 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 | 219 |
| 41 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 42 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 | 56 |
| 43 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 44 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 | 468 |
| 45 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| 46 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 | 48 |
| 47 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 | 499 |
| 48 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 |
| 49 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 |
| 50 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
| 51 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 |
| 52 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 |
| 53 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 | 359 |
| 54 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 |
| 55 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 | 1703 |
| 56 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 |
| 57 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 |
| 58 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| 59 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | 18 |
| 60 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 | 55 |
| 61 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 | 287 |
| 62 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 |
| 63 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 | 372 |
#distribution of diagnosis categories in the DIAG_CAT_1 column
df.groupby('DIAG_CAT_1').count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DIAG_CAT_1 | |||||||||||||||||||||||||||||||||||||||||||||||
| 0 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 |
| 1 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 | 72 |
| 2 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 | 1900 |
| 3 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 | 6311 |
| 4 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 | 624 |
| 5 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 | 1233 |
| 6 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 | 1632 |
| 7 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 8 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 | 89 |
| 9 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 | 824 |
| 10 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 | 5774 |
| 11 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 | 292 |
| 12 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 | 1969 |
| 13 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 | 3766 |
| 14 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 | 2457 |
| 15 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 | 1423 |
| 16 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 | 5797 |
| 17 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 | 5124 |
| 18 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 | 2825 |
| 19 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 | 367 |
| 20 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 | 1462 |
| 21 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 | 2677 |
| 22 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 |
| 23 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 | 4273 |
| 24 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 | 2184 |
| 25 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 | 154 |
| 26 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 |
| 27 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 | 1067 |
| 28 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 | 684 |
| 31 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 32 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 | 926 |
#distribution of diagnosis categories in the DIAG_CAT_2 column
df.groupby('DIAG_CAT_2').count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DIAG_CAT_2 | |||||||||||||||||||||||||||||||||||||||||||||||
| 0 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 |
| 1 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 | 193 |
| 2 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 | 1392 |
| 3 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 | 11530 |
| 4 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 | 1649 |
| 5 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 | 1436 |
| 6 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 | 964 |
| 8 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 | 188 |
| 9 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 | 3946 |
| 10 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 | 3984 |
| 11 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 |
| 12 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 | 4402 |
| 13 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 | 3719 |
| 14 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 | 456 |
| 15 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 | 781 |
| 16 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 | 5578 |
| 17 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 | 2198 |
| 18 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 | 4365 |
| 19 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 | 234 |
| 20 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 | 2113 |
| 21 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 | 972 |
| 22 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 |
| 23 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 | 2635 |
| 24 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 | 595 |
| 25 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 |
| 26 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 | 77 |
| 27 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 | 269 |
| 28 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 | 540 |
| 31 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 | 396 |
| 32 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 |
#distribution of diagnosis categories in the DIAG_CAT_3 column
df.groupby('DIAG_CAT_3').count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DIAG_CAT_3 | |||||||||||||||||||||||||||||||||||||||||||||||
| 0 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 | 768 |
| 1 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 | 196 |
| 2 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 |
| 3 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 | 14633 |
| 4 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 | 1369 |
| 5 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 | 1744 |
| 6 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 | 1106 |
| 8 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 | 231 |
| 9 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 | 6087 |
| 10 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 | 3161 |
| 11 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 | 114 |
| 12 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 | 3547 |
| 13 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 | 2526 |
| 14 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 | 374 |
| 15 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 | 741 |
| 16 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 | 3700 |
| 17 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 |
| 18 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 | 3532 |
| 19 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 | 150 |
| 20 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 | 1491 |
| 21 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 | 1015 |
| 22 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 | 49 |
| 23 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 | 2502 |
| 24 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 | 511 |
| 25 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 |
| 26 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 | 159 |
| 27 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 | 144 |
| 28 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 | 361 |
| 31 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 | 689 |
| 32 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 | 2057 |
#replace the values in the medicene column:
# No = 0
# Down = 1
# Steady = 2
# Up = 3
df = df.replace({'metformin': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'repaglinide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'nateglinide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'chlorpropamide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'glimepiride': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'acetohexamide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'glipizide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'glyburide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'tolbutamide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'pioglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'rosiglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'acarbose': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'miglitol': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'troglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'tolazamide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'examide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'citoglipton': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'insulin': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'glyburide.metformin': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'glipizide.metformin': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'glimepiride.pioglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'metformin.rosiglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df = df.replace({'metformin.pioglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
df.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
# check to make sure all factors are now int
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 56000 entries, 0 to 55999 Data columns (total 48 columns): race 56000 non-null int64 gender 56000 non-null int64 age 56000 non-null int64 weight 56000 non-null int64 admission_type_id 56000 non-null int64 discharge_disposition_id 56000 non-null int64 admission_source_id 56000 non-null int64 time_in_hospital 56000 non-null int64 payer_code 56000 non-null int64 MED_SPEC_NUM 56000 non-null int64 num_lab_procedures 56000 non-null int64 num_procedures 56000 non-null int64 num_medications 56000 non-null int64 number_outpatient 56000 non-null int64 number_emergency 56000 non-null int64 number_inpatient 56000 non-null int64 DIAG_CAT_1 56000 non-null int64 DIAG_CAT_2 56000 non-null int64 DIAG_CAT_3 56000 non-null int64 number_diagnoses 56000 non-null int64 max_glu_serum 56000 non-null int64 A1Cresult 56000 non-null int64 metformin 56000 non-null int64 repaglinide 56000 non-null int64 nateglinide 56000 non-null int64 chlorpropamide 56000 non-null int64 glimepiride 56000 non-null int64 acetohexamide 56000 non-null int64 glipizide 56000 non-null int64 glyburide 56000 non-null int64 tolbutamide 56000 non-null int64 pioglitazone 56000 non-null int64 rosiglitazone 56000 non-null int64 acarbose 56000 non-null int64 miglitol 56000 non-null int64 troglitazone 56000 non-null int64 tolazamide 56000 non-null int64 examide 56000 non-null int64 citoglipton 56000 non-null int64 insulin 56000 non-null int64 glyburide.metformin 56000 non-null int64 glipizide.metformin 56000 non-null int64 glimepiride.pioglitazone 56000 non-null int64 metformin.rosiglitazone 56000 non-null int64 metformin.pioglitazone 56000 non-null int64 change 56000 non-null int64 diabetesMed 56000 non-null int64 readmitted 56000 non-null int64 dtypes: int64(48) memory usage: 20.5 MB
# save converted data frame with only int to a new file
df_clean_NoString = df
# check to make sure all factors of the new data frame are int
df_clean_NoString.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 56000 entries, 0 to 55999 Data columns (total 48 columns): race 56000 non-null int64 gender 56000 non-null int64 age 56000 non-null int64 weight 56000 non-null int64 admission_type_id 56000 non-null int64 discharge_disposition_id 56000 non-null int64 admission_source_id 56000 non-null int64 time_in_hospital 56000 non-null int64 payer_code 56000 non-null int64 MED_SPEC_NUM 56000 non-null int64 num_lab_procedures 56000 non-null int64 num_procedures 56000 non-null int64 num_medications 56000 non-null int64 number_outpatient 56000 non-null int64 number_emergency 56000 non-null int64 number_inpatient 56000 non-null int64 DIAG_CAT_1 56000 non-null int64 DIAG_CAT_2 56000 non-null int64 DIAG_CAT_3 56000 non-null int64 number_diagnoses 56000 non-null int64 max_glu_serum 56000 non-null int64 A1Cresult 56000 non-null int64 metformin 56000 non-null int64 repaglinide 56000 non-null int64 nateglinide 56000 non-null int64 chlorpropamide 56000 non-null int64 glimepiride 56000 non-null int64 acetohexamide 56000 non-null int64 glipizide 56000 non-null int64 glyburide 56000 non-null int64 tolbutamide 56000 non-null int64 pioglitazone 56000 non-null int64 rosiglitazone 56000 non-null int64 acarbose 56000 non-null int64 miglitol 56000 non-null int64 troglitazone 56000 non-null int64 tolazamide 56000 non-null int64 examide 56000 non-null int64 citoglipton 56000 non-null int64 insulin 56000 non-null int64 glyburide.metformin 56000 non-null int64 glipizide.metformin 56000 non-null int64 glimepiride.pioglitazone 56000 non-null int64 metformin.rosiglitazone 56000 non-null int64 metformin.pioglitazone 56000 non-null int64 change 56000 non-null int64 diabetesMed 56000 non-null int64 readmitted 56000 non-null int64 dtypes: int64(48) memory usage: 20.5 MB
# write dataframe with no string values to new csv file
df_clean_NoString.to_csv('data/Challenge_1_Training_Work_Clean_NoString.csv')
# input exploratory analysis slide here
Image('Images/christensen_finalprj/Slide9.png')
Exploratory data analysis¶
#import ETL patient data
df = pd.read_csv('data/Challenge_1_Training_Work_Clean_NoString.csv')
df.head(5)
| Unnamed: 0 | race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 1 | 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 2 | 2 | 5 | 0 | 6 | 0 | 1 | 22 | 7 | 4 | 14 | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 24 | 18 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 3 | 3 | 3 | 1 | 7 | 0 | 1 | 1 | 7 | 3 | 0 | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 17 | 4 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 4 | 4 | 3 | 0 | 3 | 0 | 2 | 1 | 1 | 3 | 0 | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 23 | 18 | 32 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 56000 entries, 0 to 55999 Data columns (total 49 columns): Unnamed: 0 56000 non-null int64 race 56000 non-null int64 gender 56000 non-null int64 age 56000 non-null int64 weight 56000 non-null int64 admission_type_id 56000 non-null int64 discharge_disposition_id 56000 non-null int64 admission_source_id 56000 non-null int64 time_in_hospital 56000 non-null int64 payer_code 56000 non-null int64 MED_SPEC_NUM 56000 non-null int64 num_lab_procedures 56000 non-null int64 num_procedures 56000 non-null int64 num_medications 56000 non-null int64 number_outpatient 56000 non-null int64 number_emergency 56000 non-null int64 number_inpatient 56000 non-null int64 DIAG_CAT_1 56000 non-null int64 DIAG_CAT_2 56000 non-null int64 DIAG_CAT_3 56000 non-null int64 number_diagnoses 56000 non-null int64 max_glu_serum 56000 non-null int64 A1Cresult 56000 non-null int64 metformin 56000 non-null int64 repaglinide 56000 non-null int64 nateglinide 56000 non-null int64 chlorpropamide 56000 non-null int64 glimepiride 56000 non-null int64 acetohexamide 56000 non-null int64 glipizide 56000 non-null int64 glyburide 56000 non-null int64 tolbutamide 56000 non-null int64 pioglitazone 56000 non-null int64 rosiglitazone 56000 non-null int64 acarbose 56000 non-null int64 miglitol 56000 non-null int64 troglitazone 56000 non-null int64 tolazamide 56000 non-null int64 examide 56000 non-null int64 citoglipton 56000 non-null int64 insulin 56000 non-null int64 glyburide.metformin 56000 non-null int64 glipizide.metformin 56000 non-null int64 glimepiride.pioglitazone 56000 non-null int64 metformin.rosiglitazone 56000 non-null int64 metformin.pioglitazone 56000 non-null int64 change 56000 non-null int64 diabetesMed 56000 non-null int64 readmitted 56000 non-null int64 dtypes: int64(49) memory usage: 20.9 MB
#drop or remove the column 'Unnamed: 0' since this column is not used in the analysis and disply the result
df = df.drop('Unnamed: 0', axis=1)
df.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
# basic statistics
df.describe()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.0 | 56000.0 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.0 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 |
| mean | 2.602071 | 0.464464 | 6.096589 | 0.123946 | 2.016893 | 3.721821 | 5.756643 | 4.398161 | 4.369375 | 10.668643 | 43.141661 | 1.335893 | 16.009268 | 0.367321 | 0.196875 | 0.637054 | 14.213321 | 12.011054 | 11.357411 | 7.423750 | 0.092089 | 0.368375 | 0.398875 | 0.029911 | 0.014089 | 0.001732 | 0.102536 | 0.000036 | 0.254732 | 0.210071 | 0.000357 | 0.146161 | 0.125821 | 0.006214 | 0.000857 | 0.000107 | 0.000714 | 0.0 | 0.0 | 1.058839 | 0.013214 | 0.000321 | 0.0 | 0.000036 | 0.000036 | 0.462679 | 0.769821 | 0.572268 |
| std | 0.937754 | 0.498740 | 1.590761 | 0.712004 | 1.438340 | 5.291517 | 4.053838 | 2.984346 | 4.363828 | 15.595799 | 19.656507 | 1.702009 | 8.132455 | 1.249570 | 0.916820 | 1.270768 | 7.272908 | 7.443902 | 8.157131 | 1.931488 | 0.431655 | 0.890972 | 0.815169 | 0.247161 | 0.169132 | 0.060480 | 0.449274 | 0.008452 | 0.678992 | 0.627625 | 0.026724 | 0.525985 | 0.490002 | 0.112904 | 0.042249 | 0.014638 | 0.037790 | 0.0 | 0.0 | 1.102484 | 0.162472 | 0.025353 | 0.0 | 0.008452 | 0.008452 | 0.498610 | 0.420951 | 0.685018 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 3.000000 | 0.000000 | 5.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 2.000000 | 0.000000 | 0.000000 | 32.000000 | 0.000000 | 10.000000 | 0.000000 | 0.000000 | 0.000000 | 10.000000 | 4.000000 | 3.000000 | 6.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 50% | 3.000000 | 0.000000 | 6.000000 | 0.000000 | 1.000000 | 1.000000 | 7.000000 | 4.000000 | 6.000000 | 4.000000 | 44.000000 | 1.000000 | 15.000000 | 0.000000 | 0.000000 | 0.000000 | 15.000000 | 12.000000 | 10.000000 | 8.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 1.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 75% | 3.000000 | 1.000000 | 7.000000 | 0.000000 | 3.000000 | 4.000000 | 7.000000 | 6.000000 | 7.000000 | 18.000000 | 57.000000 | 2.000000 | 20.000000 | 0.000000 | 0.000000 | 1.000000 | 18.000000 | 17.000000 | 17.000000 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 2.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 |
| max | 5.000000 | 1.000000 | 9.000000 | 9.000000 | 8.000000 | 28.000000 | 25.000000 | 14.000000 | 16.000000 | 63.000000 | 132.000000 | 6.000000 | 75.000000 | 42.000000 | 76.000000 | 21.000000 | 32.000000 | 32.000000 | 32.000000 | 16.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 2.000000 | 3.000000 | 3.000000 | 2.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 2.000000 | 2.000000 | 0.0 | 0.0 | 3.000000 | 3.000000 | 2.000000 | 0.0 | 2.000000 | 2.000000 | 1.000000 | 1.000000 | 2.000000 |
Basic Statistics Notes¶
- mean: caucasion, female, 60's, Urgent, Discharged/Txfr'd, Txfr from facility, 4.4 days, CP payer, 43 lab procedures, 16 meds,
- 0.4 out patient visit prev yr, 0.2 ER visits, 0.64 Inpatient, 7.4 diag's, 0.09 Gluc, Ai 0.4,
- Several Meds at 0 to little use - need to eliminate some meds
- Need to use dummy variables for 'readmitted' and combine No/0 and >30/1, since the question is if admitted <30 only
# correlation analysis
df.corr()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| race | 1.000000 | 0.061706 | 0.114255 | 0.040520 | 0.096587 | 0.005805 | 0.033113 | -0.020364 | 0.041640 | -0.030777 | -0.023193 | 0.024391 | 0.022157 | 0.050845 | -0.012812 | -0.006053 | 0.042924 | 0.029594 | 0.016000 | 0.081672 | 0.054576 | -0.013318 | 0.010548 | 0.025466 | -0.004170 | 0.006801 | 0.008261 | 0.001793 | 0.018551 | 0.015784 | -0.001455 | 0.026105 | 0.005938 | 0.013237 | -0.001307 | 0.003106 | 0.003990 | NaN | NaN | -0.039862 | 0.006384 | 0.005380 | NaN | -0.011726 | 0.001793 | 0.008300 | -0.004537 | 0.014912 |
| gender | 0.061706 | 1.000000 | -0.048579 | 0.014491 | 0.014578 | -0.019566 | -0.005222 | -0.031088 | 0.000833 | 0.016623 | -0.004968 | 0.061668 | -0.023819 | -0.005846 | -0.024202 | -0.013405 | -0.034311 | 0.008083 | 0.008343 | -0.007818 | -0.001347 | 0.016539 | 0.001549 | -0.004777 | -0.005390 | 0.006481 | -0.000156 | -0.003935 | 0.026810 | 0.034631 | -0.001727 | 0.002339 | 0.010843 | 0.010581 | 0.009920 | 0.007860 | 0.003242 | NaN | NaN | 0.000247 | 0.002489 | 0.007965 | NaN | 0.004538 | -0.003935 | 0.012476 | 0.015391 | -0.013626 |
| age | 0.114255 | -0.048579 | 1.000000 | 0.005716 | -0.005747 | 0.113970 | 0.041070 | 0.107273 | 0.058032 | -0.068202 | 0.025665 | -0.028360 | 0.039010 | 0.029064 | -0.089149 | -0.047012 | 0.091837 | 0.077541 | 0.052021 | 0.243515 | 0.018618 | -0.147559 | -0.060696 | 0.045565 | 0.020363 | 0.012367 | 0.044360 | 0.002400 | 0.055867 | 0.076798 | 0.010110 | 0.013860 | 0.003034 | 0.008092 | 0.011788 | -0.001978 | 0.003605 | NaN | NaN | -0.079078 | -0.002451 | 0.003658 | NaN | 0.002400 | -0.000257 | -0.037793 | -0.025360 | 0.029704 |
| weight | 0.040520 | 0.014491 | 0.005716 | 1.000000 | 0.037503 | -0.035383 | 0.003026 | 0.023652 | 0.047819 | 0.004630 | 0.090456 | 0.018693 | 0.011274 | 0.104440 | 0.003706 | -0.009154 | 0.023982 | 0.031824 | 0.014000 | 0.054391 | -0.037139 | -0.021109 | 0.007304 | -0.005440 | 0.010707 | -0.000839 | 0.013694 | -0.000736 | 0.017062 | 0.008707 | -0.002326 | 0.026059 | 0.004232 | 0.010411 | -0.003532 | -0.001274 | 0.000692 | NaN | NaN | -0.076697 | -0.014159 | -0.002207 | NaN | -0.000736 | -0.000736 | -0.041219 | -0.030585 | 0.027236 |
| admission_type_id | 0.096587 | 0.014578 | -0.005747 | 0.037503 | 1.000000 | 0.085986 | 0.098007 | -0.014285 | -0.136863 | 0.185351 | -0.145869 | 0.131923 | 0.075711 | 0.030746 | -0.018190 | -0.032648 | 0.032151 | -0.005648 | -0.008918 | -0.113991 | 0.352793 | -0.043929 | 0.008631 | -0.003481 | -0.008099 | 0.007875 | -0.003178 | -0.002988 | 0.007991 | -0.002804 | 0.006347 | 0.018570 | 0.022930 | 0.006061 | -0.001414 | 0.003307 | 0.010291 | NaN | NaN | -0.025368 | -0.000573 | -0.005046 | NaN | -0.002988 | 0.002888 | 0.003992 | -0.003930 | -0.008561 |
| discharge_disposition_id | 0.005805 | -0.019566 | 0.113970 | -0.035383 | 0.085986 | 1.000000 | 0.016614 | 0.161954 | -0.123220 | -0.024028 | 0.022906 | 0.015536 | 0.105415 | -0.006101 | -0.024692 | 0.019240 | 0.034616 | 0.029774 | 0.024778 | 0.049496 | 0.037086 | -0.020713 | -0.008376 | -0.002759 | -0.008790 | 0.018525 | -0.022360 | 0.014597 | -0.013379 | 0.048256 | 0.003228 | -0.014116 | -0.001694 | 0.006779 | 0.005779 | 0.008684 | 0.013139 | NaN | NaN | -0.041842 | -0.002994 | 0.000933 | NaN | -0.002174 | -0.000576 | -0.014047 | -0.029452 | 0.009300 |
| admission_source_id | 0.033113 | -0.005222 | 0.041070 | 0.003026 | 0.098007 | 0.016614 | 1.000000 | -0.006996 | -0.100157 | -0.152760 | 0.046823 | -0.137044 | -0.055016 | 0.028833 | 0.061938 | 0.033697 | -0.007753 | -0.019796 | 0.001447 | 0.076318 | 0.412356 | 0.006512 | -0.033283 | -0.003732 | -0.019612 | 0.002666 | -0.026685 | 0.001296 | 0.009300 | 0.004919 | 0.001791 | -0.005729 | -0.008894 | -0.000753 | -0.000763 | 0.002245 | 0.001834 | NaN | NaN | 0.005094 | -0.024616 | -0.000281 | NaN | 0.001296 | -0.004958 | 0.002583 | 0.000535 | 0.030377 |
| time_in_hospital | -0.020364 | -0.031088 | 0.107273 | 0.023652 | -0.014285 | 0.161954 | -0.006996 | 1.000000 | -0.037805 | 0.023146 | 0.318234 | 0.193139 | 0.468752 | -0.003410 | -0.005467 | 0.079929 | -0.019913 | 0.086503 | 0.068677 | 0.224265 | 0.029079 | 0.058088 | -0.009071 | 0.034985 | 0.003320 | 0.004094 | 0.016086 | 0.013596 | 0.016737 | 0.023482 | 0.001799 | 0.008521 | 0.008531 | 0.007231 | 0.005083 | 0.004746 | 0.000328 | NaN | NaN | 0.101223 | -0.006358 | -0.001692 | NaN | -0.003396 | 0.002268 | 0.112359 | 0.059464 | 0.057129 |
| payer_code | 0.041640 | 0.000833 | 0.058032 | 0.047819 | -0.136863 | -0.123220 | -0.100157 | -0.037805 | 1.000000 | -0.082746 | -0.049680 | -0.047581 | 0.005658 | 0.062572 | 0.067316 | 0.009598 | 0.008458 | 0.036335 | 0.033135 | 0.076424 | -0.095739 | -0.006824 | 0.027596 | 0.032986 | 0.014676 | -0.022046 | 0.038055 | -0.004231 | 0.005875 | -0.047599 | -0.002662 | 0.034867 | -0.008782 | -0.002629 | 0.011455 | -0.007329 | -0.015677 | NaN | NaN | 0.115265 | 0.055730 | 0.010871 | NaN | 0.009326 | -0.000358 | 0.121010 | 0.077597 | 0.004353 |
| MED_SPEC_NUM | -0.030777 | 0.016623 | -0.068202 | 0.004630 | 0.185351 | -0.024028 | -0.152760 | 0.023146 | -0.082746 | 1.000000 | -0.068863 | 0.076952 | 0.036943 | -0.051445 | -0.009879 | -0.013909 | 0.018820 | -0.019354 | -0.015192 | -0.176693 | -0.003316 | -0.009813 | 0.023068 | 0.010220 | 0.006590 | 0.002161 | 0.012798 | -0.002891 | 0.007273 | -0.005929 | -0.001944 | 0.002210 | 0.016639 | -0.005808 | 0.002816 | -0.002660 | -0.004689 | NaN | NaN | -0.014342 | 0.000051 | -0.006234 | NaN | -0.002891 | 0.010115 | -0.005111 | -0.002299 | -0.044800 |
| num_lab_procedures | -0.023193 | -0.004968 | 0.025665 | 0.090456 | -0.145869 | 0.022906 | 0.046823 | 0.318234 | -0.049680 | -0.068863 | 1.000000 | 0.055081 | 0.267707 | -0.008437 | 0.000613 | 0.037763 | -0.071046 | 0.011204 | 0.011021 | 0.149116 | -0.124907 | 0.236383 | -0.044042 | 0.010438 | -0.008292 | -0.005659 | 0.005344 | 0.005344 | 0.012450 | -0.001768 | -0.001320 | -0.015599 | -0.010260 | -0.000654 | -0.002963 | 0.005036 | 0.000008 | NaN | NaN | 0.085401 | -0.010852 | -0.006685 | NaN | 0.001689 | -0.004330 | 0.062801 | 0.030903 | 0.035997 |
| num_procedures | 0.024391 | 0.061668 | -0.028360 | 0.018693 | 0.131923 | 0.015536 | -0.137044 | 0.193139 | -0.047581 | 0.076952 | 0.055081 | 1.000000 | 0.387685 | -0.028257 | -0.033659 | -0.061114 | -0.056866 | 0.036607 | 0.025920 | 0.074394 | -0.069910 | -0.017477 | -0.038122 | 0.005662 | -0.002359 | 0.004757 | 0.007223 | 0.006615 | 0.004999 | 0.001531 | -0.003423 | 0.016471 | 0.018742 | -0.000362 | -0.001521 | -0.005745 | 0.005154 | NaN | NaN | 0.015020 | -0.000553 | -0.006640 | NaN | -0.003317 | -0.000834 | 0.005976 | -0.009904 | -0.037714 |
| num_medications | 0.022157 | -0.023819 | 0.039010 | 0.011274 | 0.075711 | 0.105415 | -0.055016 | 0.468752 | 0.005658 | 0.036943 | 0.267707 | 0.387685 | 1.000000 | 0.047313 | 0.017129 | 0.066793 | 0.004288 | 0.084268 | 0.063166 | 0.263311 | 0.001639 | 0.013044 | 0.069433 | 0.019283 | 0.023352 | -0.000940 | 0.045223 | 0.009348 | 0.056985 | 0.030886 | 0.002943 | 0.071584 | 0.052860 | 0.017947 | 0.006422 | 0.002992 | -0.002113 | NaN | NaN | 0.198963 | 0.013382 | 0.002757 | NaN | -0.002603 | 0.002074 | 0.248529 | 0.186247 | 0.050711 |
| number_outpatient | 0.050845 | -0.005846 | 0.029064 | 0.104440 | 0.030746 | -0.006101 | 0.028833 | -0.003410 | 0.062572 | -0.051445 | -0.008437 | -0.028257 | 0.047313 | 1.000000 | 0.087824 | 0.103471 | -0.009347 | 0.028015 | 0.026595 | 0.093518 | 0.054949 | -0.024324 | -0.013006 | 0.001026 | 0.002719 | -0.004402 | -0.009039 | -0.001242 | 0.010527 | -0.000482 | 0.000350 | 0.012212 | -0.001550 | 0.009388 | -0.002243 | -0.002152 | -0.005556 | NaN | NaN | 0.010029 | -0.008428 | 0.003037 | NaN | -0.001242 | -0.001242 | 0.027105 | 0.017340 | 0.068145 |
| number_emergency | -0.012812 | -0.024202 | -0.089149 | 0.003706 | -0.018190 | -0.024692 | 0.061938 | -0.005467 | 0.067316 | -0.009879 | 0.000613 | -0.033659 | 0.017129 | 0.087824 | 1.000000 | 0.279626 | -0.023803 | -0.004155 | 0.007427 | 0.059398 | 0.035679 | -0.004270 | -0.009572 | 0.007820 | 0.005489 | -0.004218 | 0.003318 | -0.000907 | -0.003426 | -0.027870 | -0.002870 | -0.001978 | -0.006844 | 0.004224 | -0.000207 | -0.001572 | -0.004059 | NaN | NaN | 0.048501 | 0.001956 | -0.002723 | NaN | -0.000907 | -0.000907 | 0.041797 | 0.029415 | 0.103321 |
| number_inpatient | -0.006053 | -0.013405 | -0.047012 | -0.009154 | -0.032648 | 0.019240 | 0.033697 | 0.079929 | 0.009598 | -0.013909 | 0.037763 | -0.061114 | 0.066793 | 0.103471 | 0.279626 | 1.000000 | -0.004620 | 0.024244 | 0.032150 | 0.102473 | 0.038503 | -0.049379 | -0.073780 | 0.011936 | -0.006284 | -0.008317 | -0.016545 | -0.002118 | -0.022736 | -0.036659 | -0.003545 | -0.026804 | -0.021471 | 0.000411 | -0.003851 | -0.003669 | -0.003526 | NaN | NaN | 0.060505 | -0.008426 | -0.000813 | NaN | 0.001207 | -0.002118 | 0.025420 | 0.025559 | 0.233149 |
| DIAG_CAT_1 | 0.042924 | -0.034311 | 0.091837 | 0.023982 | 0.032151 | 0.034616 | -0.007753 | -0.019913 | 0.008458 | 0.018820 | -0.071046 | -0.056866 | 0.004288 | -0.009347 | -0.023803 | -0.004620 | 1.000000 | 0.025858 | 0.028021 | 0.046451 | -0.016030 | -0.091392 | 0.033199 | 0.002242 | -0.000440 | -0.002017 | 0.000410 | 0.001038 | 0.010541 | 0.017872 | 0.006039 | 0.024890 | 0.010041 | 0.003061 | 0.006030 | 0.000456 | 0.000745 | NaN | NaN | -0.075260 | 0.015281 | 0.004664 | NaN | -0.003029 | 0.003943 | -0.033688 | -0.028985 | -0.004994 |
| DIAG_CAT_2 | 0.029594 | 0.008083 | 0.077541 | 0.031824 | -0.005648 | 0.029774 | -0.019796 | 0.086503 | 0.036335 | -0.019354 | 0.011204 | 0.036607 | 0.084268 | 0.028015 | -0.004155 | 0.024244 | 0.025858 | 1.000000 | 0.081391 | 0.171521 | -0.017962 | -0.044930 | -0.018313 | 0.003082 | -0.000322 | -0.004128 | 0.006773 | 0.002264 | 0.004223 | 0.010435 | 0.000339 | 0.000030 | -0.010618 | 0.000704 | 0.005705 | 0.001300 | -0.003710 | NaN | NaN | -0.007776 | -0.007621 | 0.005659 | NaN | -0.000006 | -0.005116 | -0.006439 | -0.010210 | 0.011850 |
| DIAG_CAT_3 | 0.016000 | 0.008343 | 0.052021 | 0.014000 | -0.008918 | 0.024778 | 0.001447 | 0.068677 | 0.033135 | -0.015192 | 0.011021 | 0.025920 | 0.063166 | 0.026595 | 0.007427 | 0.032150 | 0.028021 | 0.081391 | 1.000000 | 0.186667 | -0.009693 | -0.031716 | -0.024179 | 0.005636 | 0.003922 | -0.007445 | -0.010677 | 0.000333 | -0.005554 | -0.005157 | -0.002879 | -0.008180 | -0.003303 | 0.000458 | -0.000319 | -0.000620 | -0.003145 | NaN | NaN | 0.013942 | -0.000101 | 0.006007 | NaN | 0.006032 | -0.004330 | 0.005824 | -0.007452 | 0.027877 |
| number_diagnoses | 0.081672 | -0.007818 | 0.243515 | 0.054391 | -0.113991 | 0.049496 | 0.076318 | 0.224265 | 0.076424 | -0.176693 | 0.149116 | 0.074394 | 0.263311 | 0.093518 | 0.059398 | 0.102473 | 0.046451 | 0.171521 | 0.186667 | 1.000000 | -0.036161 | -0.032983 | -0.073736 | 0.033225 | 0.012336 | -0.014080 | 0.013640 | 0.003449 | -0.005975 | -0.024247 | 0.001220 | 0.002278 | -0.011524 | 0.007741 | -0.000293 | 0.004710 | -0.013444 | NaN | NaN | 0.076730 | -0.005894 | -0.006428 | NaN | 0.003449 | -0.007491 | 0.055250 | 0.019375 | 0.103885 |
| max_glu_serum | 0.054576 | -0.001347 | 0.018618 | -0.037139 | 0.352793 | 0.037086 | 0.412356 | 0.029079 | -0.095739 | -0.003316 | -0.124907 | -0.069910 | 0.001639 | 0.054949 | 0.035679 | 0.038503 | -0.016030 | -0.017962 | -0.009693 | -0.036161 | 1.000000 | -0.043540 | -0.029790 | -0.015106 | -0.016794 | 0.008938 | -0.031840 | -0.000902 | 0.005931 | 0.000373 | 0.006437 | -0.014531 | -0.009275 | 0.005479 | -0.004328 | -0.001562 | -0.004032 | NaN | NaN | 0.000884 | -0.014296 | -0.002705 | NaN | -0.000902 | -0.000902 | 0.008958 | -0.005206 | 0.017684 |
| A1Cresult | -0.013318 | 0.016539 | -0.147559 | -0.021109 | -0.043929 | -0.020713 | 0.006512 | 0.058088 | -0.006824 | -0.009813 | 0.236383 | -0.017477 | 0.013044 | -0.024324 | -0.004270 | -0.049379 | -0.091392 | -0.044930 | -0.031716 | -0.032983 | -0.043540 | 1.000000 | 0.051894 | 0.022541 | -0.000669 | -0.003225 | 0.022787 | -0.001747 | 0.020844 | 0.009977 | -0.005526 | 0.000223 | 0.009548 | 0.009374 | 0.007741 | -0.003026 | -0.000390 | NaN | NaN | 0.107227 | -0.005008 | 0.001082 | NaN | -0.001747 | -0.001747 | 0.105614 | 0.086291 | -0.013614 |
| metformin | 0.010548 | 0.001549 | -0.060696 | 0.007304 | 0.008631 | -0.008376 | -0.033283 | -0.009071 | 0.027596 | 0.023068 | -0.044042 | -0.038122 | 0.069433 | -0.013006 | -0.009572 | -0.073780 | 0.033199 | -0.018313 | -0.024179 | -0.073736 | -0.029790 | 0.051894 | 1.000000 | -0.001074 | 0.020372 | -0.011841 | 0.047475 | -0.002068 | 0.077111 | 0.129061 | -0.006539 | 0.060566 | 0.097708 | 0.006246 | 0.005628 | -0.003582 | 0.004664 | NaN | NaN | -0.017392 | -0.021191 | -0.002748 | NaN | 0.008300 | 0.003116 | 0.325302 | 0.267566 | -0.035809 |
| repaglinide | 0.025466 | -0.004777 | 0.045565 | -0.005440 | -0.003481 | -0.002759 | -0.003732 | 0.034985 | 0.032986 | 0.010220 | 0.010438 | 0.005662 | 0.019283 | 0.001026 | 0.007820 | 0.011936 | 0.002242 | 0.003082 | 0.005636 | 0.033225 | -0.015106 | 0.022541 | -0.001074 | 1.000000 | -0.003246 | -0.003466 | -0.007518 | -0.000511 | -0.015927 | -0.024160 | -0.001617 | 0.019393 | 0.009031 | 0.011257 | 0.018066 | -0.000886 | -0.002287 | NaN | NaN | 0.006058 | -0.004506 | -0.001534 | NaN | -0.000511 | -0.000511 | 0.071294 | 0.066174 | 0.014286 |
| nateglinide | -0.004170 | -0.005390 | 0.020363 | 0.010707 | -0.008099 | -0.008790 | -0.019612 | 0.003320 | 0.014676 | 0.006590 | -0.008292 | -0.002359 | 0.023352 | 0.002719 | 0.005489 | -0.006284 | -0.000440 | -0.000322 | 0.003922 | 0.012336 | -0.016794 | -0.000669 | 0.020372 | -0.003246 | 1.000000 | -0.002386 | 0.004488 | -0.000352 | -0.018191 | -0.020817 | -0.001113 | 0.025830 | 0.013947 | -0.004585 | 0.018302 | -0.000610 | -0.001575 | NaN | NaN | 0.001396 | -0.006775 | -0.001056 | NaN | -0.000352 | -0.000352 | 0.052927 | 0.045552 | 0.007164 |
| chlorpropamide | 0.006801 | 0.006481 | 0.012367 | -0.000839 | 0.007875 | 0.018525 | 0.002666 | 0.004094 | -0.022046 | 0.002161 | -0.005659 | 0.004757 | -0.000940 | -0.004402 | -0.004218 | -0.008317 | -0.002017 | -0.004128 | -0.007445 | -0.014080 | 0.008938 | -0.003225 | -0.011841 | -0.003466 | -0.002386 | 1.000000 | -0.006537 | -0.000121 | -0.010745 | -0.005823 | -0.000383 | -0.007959 | -0.000123 | -0.001576 | -0.000581 | -0.000210 | -0.000541 | NaN | NaN | -0.020008 | -0.002329 | -0.000363 | NaN | -0.000121 | -0.000121 | -0.007035 | 0.015661 | -0.002806 |
| glimepiride | 0.008261 | -0.000156 | 0.044360 | 0.013694 | -0.003178 | -0.022360 | -0.026685 | 0.016086 | 0.038055 | 0.012798 | 0.005344 | 0.007223 | 0.045223 | -0.009039 | 0.003318 | -0.016545 | 0.000410 | 0.006773 | -0.010677 | 0.013640 | -0.031840 | 0.022787 | 0.047475 | -0.007518 | 0.004488 | -0.006537 | 1.000000 | -0.000964 | -0.071983 | -0.067334 | -0.003050 | 0.042601 | 0.038655 | 0.018418 | 0.019830 | 0.009191 | -0.004314 | NaN | NaN | 0.012479 | -0.012202 | -0.002894 | NaN | -0.000964 | -0.000964 | 0.138970 | 0.124797 | 0.004760 |
| acetohexamide | 0.001793 | -0.003935 | 0.002400 | -0.000736 | -0.002988 | 0.014597 | 0.001296 | 0.013596 | -0.004231 | -0.002891 | 0.005344 | 0.006615 | 0.009348 | -0.001242 | -0.000907 | -0.002118 | 0.001038 | 0.002264 | 0.000333 | 0.003449 | -0.000902 | -0.001747 | -0.002068 | -0.000511 | -0.000352 | -0.000121 | -0.000964 | 1.000000 | -0.001585 | -0.001414 | -0.000056 | -0.001174 | -0.001085 | -0.000233 | -0.000086 | -0.000031 | -0.000080 | NaN | NaN | 0.003607 | -0.000344 | -0.000054 | NaN | -0.000018 | -0.000018 | 0.004554 | 0.002311 | 0.002639 |
| glipizide | 0.018551 | 0.026810 | 0.055867 | 0.017062 | 0.007991 | -0.013379 | 0.009300 | 0.016737 | 0.005875 | 0.007273 | 0.012450 | 0.004999 | 0.056985 | 0.010527 | -0.003426 | -0.022736 | 0.010541 | 0.004223 | -0.005554 | -0.005975 | 0.005931 | 0.020844 | 0.077111 | -0.015927 | -0.018191 | -0.010745 | -0.071983 | -0.001585 | 1.000000 | -0.104495 | -0.005014 | 0.049752 | 0.041498 | 0.030598 | 0.002971 | -0.002746 | -0.001524 | NaN | NaN | -0.027179 | -0.027923 | -0.000607 | NaN | -0.001585 | -0.001585 | 0.194260 | 0.205145 | 0.014766 |
| glyburide | 0.015784 | 0.034631 | 0.076798 | 0.008707 | -0.002804 | 0.048256 | 0.004919 | 0.023482 | -0.047599 | -0.005929 | -0.001768 | 0.001531 | 0.030886 | -0.000482 | -0.027870 | -0.036659 | 0.017872 | 0.010435 | -0.005157 | -0.024247 | 0.000373 | 0.009977 | 0.129061 | -0.024160 | -0.020817 | -0.005823 | -0.067334 | -0.001414 | -0.104495 | 1.000000 | -0.004473 | 0.027727 | 0.030766 | 0.015094 | -0.000056 | -0.002450 | -0.006327 | NaN | NaN | -0.071853 | -0.006909 | 0.000245 | NaN | -0.001414 | -0.001414 | 0.172392 | 0.183024 | -0.004492 |
| tolbutamide | -0.001455 | -0.001727 | 0.010110 | -0.002326 | 0.006347 | 0.003228 | 0.001791 | 0.001799 | -0.002662 | -0.001944 | -0.001320 | -0.003423 | 0.002943 | 0.000350 | -0.002870 | -0.003545 | 0.006039 | 0.000339 | -0.002879 | 0.001220 | 0.006437 | -0.005526 | -0.006539 | -0.001617 | -0.001113 | -0.000383 | -0.003050 | -0.000056 | -0.005014 | -0.004473 | 1.000000 | -0.003714 | -0.003432 | -0.000736 | -0.000271 | -0.000098 | -0.000253 | NaN | NaN | -0.001925 | -0.001087 | -0.000169 | NaN | -0.000056 | -0.000056 | 0.001000 | 0.007308 | -0.007263 |
| pioglitazone | 0.026105 | 0.002339 | 0.013860 | 0.026059 | 0.018570 | -0.014116 | -0.005729 | 0.008521 | 0.034867 | 0.002210 | -0.015599 | 0.016471 | 0.071584 | 0.012212 | -0.001978 | -0.026804 | 0.024890 | 0.000030 | -0.008180 | 0.002278 | -0.014531 | 0.000223 | 0.060566 | 0.019393 | 0.025830 | -0.007959 | 0.042601 | -0.001174 | 0.049752 | 0.027727 | -0.003714 | 1.000000 | -0.062763 | 0.015377 | 0.000791 | -0.002034 | -0.001659 | NaN | NaN | 0.003954 | 0.022117 | 0.007190 | NaN | -0.001174 | 0.014894 | 0.203180 | 0.151949 | 0.011002 |
| rosiglitazone | 0.005938 | 0.010843 | 0.003034 | 0.004232 | 0.022930 | -0.001694 | -0.008894 | 0.008531 | -0.008782 | 0.016639 | -0.010260 | 0.018742 | 0.052860 | -0.001550 | -0.006844 | -0.021471 | 0.010041 | -0.010618 | -0.003303 | -0.011524 | -0.009275 | 0.009548 | 0.097708 | 0.009031 | 0.013947 | -0.000123 | 0.038655 | -0.001085 | 0.041498 | 0.030766 | -0.003432 | -0.062763 | 1.000000 | 0.002006 | 0.003416 | 0.008079 | -0.000996 | NaN | NaN | 0.004080 | 0.003340 | -0.003256 | NaN | -0.001085 | -0.001085 | 0.191641 | 0.140410 | 0.005522 |
| acarbose | 0.013237 | 0.010581 | 0.008092 | 0.010411 | 0.006061 | 0.006779 | -0.000753 | 0.007231 | -0.002629 | -0.005808 | -0.000654 | -0.000362 | 0.017947 | 0.009388 | 0.004224 | 0.000411 | 0.003061 | 0.000704 | 0.000458 | 0.007741 | 0.005479 | 0.009374 | 0.006246 | 0.011257 | -0.004585 | -0.001576 | 0.018418 | -0.000233 | 0.030598 | 0.015094 | -0.000736 | 0.015377 | 0.002006 | 1.000000 | -0.001117 | -0.000403 | -0.001040 | NaN | NaN | -0.001790 | 0.013046 | -0.000698 | NaN | -0.000233 | -0.000233 | 0.047261 | 0.030097 | 0.007816 |
| miglitol | -0.001307 | 0.009920 | 0.011788 | -0.003532 | -0.001414 | 0.005779 | -0.000763 | 0.005083 | 0.011455 | 0.002816 | -0.002963 | -0.001521 | 0.006422 | -0.002243 | -0.000207 | -0.003851 | 0.006030 | 0.005705 | -0.000319 | -0.000293 | -0.004328 | 0.007741 | 0.005628 | 0.018066 | 0.018302 | -0.000581 | 0.019830 | -0.000086 | 0.002971 | -0.000056 | -0.000271 | 0.000791 | 0.003416 | -0.001117 | 1.000000 | -0.000148 | -0.000383 | NaN | NaN | 0.000451 | -0.001650 | -0.000257 | NaN | -0.000086 | -0.000086 | 0.018472 | 0.011094 | 0.003413 |
| troglitazone | 0.003106 | 0.007860 | -0.001978 | -0.001274 | 0.003307 | 0.008684 | 0.002245 | 0.004746 | -0.007329 | -0.002660 | 0.005036 | -0.005745 | 0.002992 | -0.002152 | -0.001572 | -0.003669 | 0.000456 | 0.001300 | -0.000620 | 0.004710 | -0.001562 | -0.003026 | -0.003582 | -0.000886 | -0.000610 | -0.000210 | 0.009191 | -0.000031 | -0.002746 | -0.002450 | -0.000098 | -0.002034 | 0.008079 | -0.000403 | -0.000148 | 1.000000 | -0.000138 | NaN | NaN | -0.000391 | -0.000595 | -0.000093 | NaN | -0.000031 | -0.000031 | 0.007888 | 0.004002 | 0.001009 |
| tolazamide | 0.003990 | 0.003242 | 0.003605 | 0.000692 | 0.010291 | 0.013139 | 0.001834 | 0.000328 | -0.015677 | -0.004689 | 0.000008 | 0.005154 | -0.002113 | -0.005556 | -0.004059 | -0.003526 | 0.000745 | -0.003710 | -0.003145 | -0.013444 | -0.004032 | -0.000390 | 0.004664 | -0.002287 | -0.001575 | -0.000541 | -0.004314 | -0.000080 | -0.001524 | -0.006327 | -0.000253 | -0.001659 | -0.000996 | -0.001040 | -0.000383 | -0.000138 | 1.000000 | NaN | NaN | -0.013867 | -0.001537 | -0.000240 | NaN | -0.000080 | -0.000080 | -0.002376 | 0.010336 | -0.007513 |
| examide | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| citoglipton | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| insulin | -0.039862 | 0.000247 | -0.079078 | -0.076697 | -0.025368 | -0.041842 | 0.005094 | 0.101223 | 0.115265 | -0.014342 | 0.085401 | 0.015020 | 0.198963 | 0.010029 | 0.048501 | 0.060505 | -0.075260 | -0.007776 | 0.013942 | 0.076730 | 0.000884 | 0.107227 | -0.017392 | 0.006058 | 0.001396 | -0.020008 | 0.012479 | 0.003607 | -0.027179 | -0.071853 | -0.001925 | 0.003954 | 0.004080 | -0.001790 | 0.000451 | -0.000391 | -0.013867 | NaN | NaN | 1.000000 | 0.005828 | -0.000677 | NaN | 0.003607 | 0.003607 | 0.461502 | 0.525169 | 0.040750 |
| glyburide.metformin | 0.006384 | 0.002489 | -0.002451 | -0.014159 | -0.000573 | -0.002994 | -0.024616 | -0.006358 | 0.055730 | 0.000051 | -0.010852 | -0.000553 | 0.013382 | -0.008428 | 0.001956 | -0.008426 | 0.015281 | -0.007621 | -0.000101 | -0.005894 | -0.014296 | -0.005008 | -0.021191 | -0.004506 | -0.006775 | -0.002329 | -0.012202 | -0.000344 | -0.027923 | -0.006909 | -0.001087 | 0.022117 | 0.003340 | 0.013046 | -0.001650 | -0.000595 | -0.001537 | NaN | NaN | 0.005828 | 1.000000 | 0.050992 | NaN | -0.000344 | -0.000344 | 0.038712 | 0.044474 | -0.001842 |
| glipizide.metformin | 0.005380 | 0.007965 | 0.003658 | -0.002207 | -0.005046 | 0.000933 | -0.000281 | -0.001692 | 0.010871 | -0.006234 | -0.006685 | -0.006640 | 0.002757 | 0.003037 | -0.002723 | -0.000813 | 0.004664 | 0.005659 | 0.006007 | -0.006428 | -0.002705 | 0.001082 | -0.002748 | -0.001534 | -0.001056 | -0.000363 | -0.002894 | -0.000054 | -0.000607 | 0.000245 | -0.000169 | 0.007190 | -0.003256 | -0.000698 | -0.000257 | -0.000093 | -0.000240 | NaN | NaN | -0.000677 | 0.050992 | 1.000000 | NaN | -0.000054 | -0.000054 | 0.010838 | 0.006933 | 0.001747 |
| glimepiride.pioglitazone | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| metformin.rosiglitazone | -0.011726 | 0.004538 | 0.002400 | -0.000736 | -0.002988 | -0.002174 | 0.001296 | -0.003396 | 0.009326 | -0.002891 | 0.001689 | -0.003317 | -0.002603 | -0.001242 | -0.000907 | 0.001207 | -0.003029 | -0.000006 | 0.006032 | 0.003449 | -0.000902 | -0.001747 | 0.008300 | -0.000511 | -0.000352 | -0.000121 | -0.000964 | -0.000018 | -0.001585 | -0.001414 | -0.000056 | -0.001174 | -0.001085 | -0.000233 | -0.000086 | -0.000031 | -0.000080 | NaN | NaN | 0.003607 | -0.000344 | -0.000054 | NaN | 1.000000 | -0.000018 | 0.004554 | 0.002311 | -0.003530 |
| metformin.pioglitazone | 0.001793 | -0.003935 | -0.000257 | -0.000736 | 0.002888 | -0.000576 | -0.004958 | 0.002268 | -0.000358 | 0.010115 | -0.004330 | -0.000834 | 0.002074 | -0.001242 | -0.000907 | -0.002118 | 0.003943 | -0.005116 | -0.004330 | -0.007491 | -0.000902 | -0.001747 | 0.003116 | -0.000511 | -0.000352 | -0.000121 | -0.000964 | -0.000018 | -0.001585 | -0.001414 | -0.000056 | 0.014894 | -0.001085 | -0.000233 | -0.000086 | -0.000031 | -0.000080 | NaN | NaN | 0.003607 | -0.000344 | -0.000054 | NaN | -0.000018 | 1.000000 | 0.004554 | 0.002311 | -0.003530 |
| change | 0.008300 | 0.012476 | -0.037793 | -0.041219 | 0.003992 | -0.014047 | 0.002583 | 0.112359 | 0.121010 | -0.005111 | 0.062801 | 0.005976 | 0.248529 | 0.027105 | 0.041797 | 0.025420 | -0.033688 | -0.006439 | 0.005824 | 0.055250 | 0.008958 | 0.105614 | 0.325302 | 0.071294 | 0.052927 | -0.007035 | 0.138970 | 0.004554 | 0.194260 | 0.172392 | 0.001000 | 0.203180 | 0.191641 | 0.047261 | 0.018472 | 0.007888 | -0.002376 | NaN | NaN | 0.461502 | 0.038712 | 0.010838 | NaN | 0.004554 | 0.004554 | 1.000000 | 0.507411 | 0.046717 |
| diabetesMed | -0.004537 | 0.015391 | -0.025360 | -0.030585 | -0.003930 | -0.029452 | 0.000535 | 0.059464 | 0.077597 | -0.002299 | 0.030903 | -0.009904 | 0.186247 | 0.017340 | 0.029415 | 0.025559 | -0.028985 | -0.010210 | -0.007452 | 0.019375 | -0.005206 | 0.086291 | 0.267566 | 0.066174 | 0.045552 | 0.015661 | 0.124797 | 0.002311 | 0.205145 | 0.183024 | 0.007308 | 0.151949 | 0.140410 | 0.030097 | 0.011094 | 0.004002 | 0.010336 | NaN | NaN | 0.525169 | 0.044474 | 0.006933 | NaN | 0.002311 | 0.002311 | 0.507411 | 1.000000 | 0.058183 |
| readmitted | 0.014912 | -0.013626 | 0.029704 | 0.027236 | -0.008561 | 0.009300 | 0.030377 | 0.057129 | 0.004353 | -0.044800 | 0.035997 | -0.037714 | 0.050711 | 0.068145 | 0.103321 | 0.233149 | -0.004994 | 0.011850 | 0.027877 | 0.103885 | 0.017684 | -0.013614 | -0.035809 | 0.014286 | 0.007164 | -0.002806 | 0.004760 | 0.002639 | 0.014766 | -0.004492 | -0.007263 | 0.011002 | 0.005522 | 0.007816 | 0.003413 | 0.001009 | -0.007513 | NaN | NaN | 0.040750 | -0.001842 | 0.001747 | NaN | -0.003530 | -0.003530 | 0.046717 | 0.058183 | 1.000000 |
Correlation Analysis Notes:¶
Strongest +'s:
Strongest -'s:
#drop or remove these columns since they are not used in any of the cases
df = df.drop('examide', axis=1)
df = df.drop('citoglipton', axis=1)
df = df.drop('glimepiride.pioglitazone', axis=1)
df.head(5)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 2 | 5 | 0 | 6 | 0 | 1 | 22 | 7 | 4 | 14 | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 24 | 18 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 3 | 3 | 1 | 7 | 0 | 1 | 1 | 7 | 3 | 0 | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 17 | 4 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 4 | 3 | 0 | 3 | 0 | 2 | 1 | 1 | 3 | 0 | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 23 | 18 | 32 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
#basic statistics
df.describe()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 |
| mean | 2.602071 | 0.464464 | 6.096589 | 0.123946 | 2.016893 | 3.721821 | 5.756643 | 4.398161 | 4.369375 | 10.668643 | 43.141661 | 1.335893 | 16.009268 | 0.367321 | 0.196875 | 0.637054 | 14.213321 | 12.011054 | 11.357411 | 7.423750 | 0.092089 | 0.368375 | 0.398875 | 0.029911 | 0.014089 | 0.001732 | 0.102536 | 0.000036 | 0.254732 | 0.210071 | 0.000357 | 0.146161 | 0.125821 | 0.006214 | 0.000857 | 0.000107 | 0.000714 | 1.058839 | 0.013214 | 0.000321 | 0.000036 | 0.000036 | 0.462679 | 0.769821 | 0.572268 |
| std | 0.937754 | 0.498740 | 1.590761 | 0.712004 | 1.438340 | 5.291517 | 4.053838 | 2.984346 | 4.363828 | 15.595799 | 19.656507 | 1.702009 | 8.132455 | 1.249570 | 0.916820 | 1.270768 | 7.272908 | 7.443902 | 8.157131 | 1.931488 | 0.431655 | 0.890972 | 0.815169 | 0.247161 | 0.169132 | 0.060480 | 0.449274 | 0.008452 | 0.678992 | 0.627625 | 0.026724 | 0.525985 | 0.490002 | 0.112904 | 0.042249 | 0.014638 | 0.037790 | 1.102484 | 0.162472 | 0.025353 | 0.008452 | 0.008452 | 0.498610 | 0.420951 | 0.685018 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 3.000000 | 0.000000 | 5.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 2.000000 | 0.000000 | 0.000000 | 32.000000 | 0.000000 | 10.000000 | 0.000000 | 0.000000 | 0.000000 | 10.000000 | 4.000000 | 3.000000 | 6.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 50% | 3.000000 | 0.000000 | 6.000000 | 0.000000 | 1.000000 | 1.000000 | 7.000000 | 4.000000 | 6.000000 | 4.000000 | 44.000000 | 1.000000 | 15.000000 | 0.000000 | 0.000000 | 0.000000 | 15.000000 | 12.000000 | 10.000000 | 8.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 75% | 3.000000 | 1.000000 | 7.000000 | 0.000000 | 3.000000 | 4.000000 | 7.000000 | 6.000000 | 7.000000 | 18.000000 | 57.000000 | 2.000000 | 20.000000 | 0.000000 | 0.000000 | 1.000000 | 18.000000 | 17.000000 | 17.000000 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 |
| max | 5.000000 | 1.000000 | 9.000000 | 9.000000 | 8.000000 | 28.000000 | 25.000000 | 14.000000 | 16.000000 | 63.000000 | 132.000000 | 6.000000 | 75.000000 | 42.000000 | 76.000000 | 21.000000 | 32.000000 | 32.000000 | 32.000000 | 16.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 2.000000 | 3.000000 | 3.000000 | 2.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 2.000000 | 2.000000 | 2.000000 | 1.000000 | 1.000000 | 2.000000 |
# correlation analysis
df.corr()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| race | 1.000000 | 0.061706 | 0.114255 | 0.040520 | 0.096587 | 0.005805 | 0.033113 | -0.020364 | 0.041640 | -0.030777 | -0.023193 | 0.024391 | 0.022157 | 0.050845 | -0.012812 | -0.006053 | 0.042924 | 0.029594 | 0.016000 | 0.081672 | 0.054576 | -0.013318 | 0.010548 | 0.025466 | -0.004170 | 0.006801 | 0.008261 | 0.001793 | 0.018551 | 0.015784 | -0.001455 | 0.026105 | 0.005938 | 0.013237 | -0.001307 | 0.003106 | 0.003990 | -0.039862 | 0.006384 | 0.005380 | -0.011726 | 0.001793 | 0.008300 | -0.004537 | 0.014912 |
| gender | 0.061706 | 1.000000 | -0.048579 | 0.014491 | 0.014578 | -0.019566 | -0.005222 | -0.031088 | 0.000833 | 0.016623 | -0.004968 | 0.061668 | -0.023819 | -0.005846 | -0.024202 | -0.013405 | -0.034311 | 0.008083 | 0.008343 | -0.007818 | -0.001347 | 0.016539 | 0.001549 | -0.004777 | -0.005390 | 0.006481 | -0.000156 | -0.003935 | 0.026810 | 0.034631 | -0.001727 | 0.002339 | 0.010843 | 0.010581 | 0.009920 | 0.007860 | 0.003242 | 0.000247 | 0.002489 | 0.007965 | 0.004538 | -0.003935 | 0.012476 | 0.015391 | -0.013626 |
| age | 0.114255 | -0.048579 | 1.000000 | 0.005716 | -0.005747 | 0.113970 | 0.041070 | 0.107273 | 0.058032 | -0.068202 | 0.025665 | -0.028360 | 0.039010 | 0.029064 | -0.089149 | -0.047012 | 0.091837 | 0.077541 | 0.052021 | 0.243515 | 0.018618 | -0.147559 | -0.060696 | 0.045565 | 0.020363 | 0.012367 | 0.044360 | 0.002400 | 0.055867 | 0.076798 | 0.010110 | 0.013860 | 0.003034 | 0.008092 | 0.011788 | -0.001978 | 0.003605 | -0.079078 | -0.002451 | 0.003658 | 0.002400 | -0.000257 | -0.037793 | -0.025360 | 0.029704 |
| weight | 0.040520 | 0.014491 | 0.005716 | 1.000000 | 0.037503 | -0.035383 | 0.003026 | 0.023652 | 0.047819 | 0.004630 | 0.090456 | 0.018693 | 0.011274 | 0.104440 | 0.003706 | -0.009154 | 0.023982 | 0.031824 | 0.014000 | 0.054391 | -0.037139 | -0.021109 | 0.007304 | -0.005440 | 0.010707 | -0.000839 | 0.013694 | -0.000736 | 0.017062 | 0.008707 | -0.002326 | 0.026059 | 0.004232 | 0.010411 | -0.003532 | -0.001274 | 0.000692 | -0.076697 | -0.014159 | -0.002207 | -0.000736 | -0.000736 | -0.041219 | -0.030585 | 0.027236 |
| admission_type_id | 0.096587 | 0.014578 | -0.005747 | 0.037503 | 1.000000 | 0.085986 | 0.098007 | -0.014285 | -0.136863 | 0.185351 | -0.145869 | 0.131923 | 0.075711 | 0.030746 | -0.018190 | -0.032648 | 0.032151 | -0.005648 | -0.008918 | -0.113991 | 0.352793 | -0.043929 | 0.008631 | -0.003481 | -0.008099 | 0.007875 | -0.003178 | -0.002988 | 0.007991 | -0.002804 | 0.006347 | 0.018570 | 0.022930 | 0.006061 | -0.001414 | 0.003307 | 0.010291 | -0.025368 | -0.000573 | -0.005046 | -0.002988 | 0.002888 | 0.003992 | -0.003930 | -0.008561 |
| discharge_disposition_id | 0.005805 | -0.019566 | 0.113970 | -0.035383 | 0.085986 | 1.000000 | 0.016614 | 0.161954 | -0.123220 | -0.024028 | 0.022906 | 0.015536 | 0.105415 | -0.006101 | -0.024692 | 0.019240 | 0.034616 | 0.029774 | 0.024778 | 0.049496 | 0.037086 | -0.020713 | -0.008376 | -0.002759 | -0.008790 | 0.018525 | -0.022360 | 0.014597 | -0.013379 | 0.048256 | 0.003228 | -0.014116 | -0.001694 | 0.006779 | 0.005779 | 0.008684 | 0.013139 | -0.041842 | -0.002994 | 0.000933 | -0.002174 | -0.000576 | -0.014047 | -0.029452 | 0.009300 |
| admission_source_id | 0.033113 | -0.005222 | 0.041070 | 0.003026 | 0.098007 | 0.016614 | 1.000000 | -0.006996 | -0.100157 | -0.152760 | 0.046823 | -0.137044 | -0.055016 | 0.028833 | 0.061938 | 0.033697 | -0.007753 | -0.019796 | 0.001447 | 0.076318 | 0.412356 | 0.006512 | -0.033283 | -0.003732 | -0.019612 | 0.002666 | -0.026685 | 0.001296 | 0.009300 | 0.004919 | 0.001791 | -0.005729 | -0.008894 | -0.000753 | -0.000763 | 0.002245 | 0.001834 | 0.005094 | -0.024616 | -0.000281 | 0.001296 | -0.004958 | 0.002583 | 0.000535 | 0.030377 |
| time_in_hospital | -0.020364 | -0.031088 | 0.107273 | 0.023652 | -0.014285 | 0.161954 | -0.006996 | 1.000000 | -0.037805 | 0.023146 | 0.318234 | 0.193139 | 0.468752 | -0.003410 | -0.005467 | 0.079929 | -0.019913 | 0.086503 | 0.068677 | 0.224265 | 0.029079 | 0.058088 | -0.009071 | 0.034985 | 0.003320 | 0.004094 | 0.016086 | 0.013596 | 0.016737 | 0.023482 | 0.001799 | 0.008521 | 0.008531 | 0.007231 | 0.005083 | 0.004746 | 0.000328 | 0.101223 | -0.006358 | -0.001692 | -0.003396 | 0.002268 | 0.112359 | 0.059464 | 0.057129 |
| payer_code | 0.041640 | 0.000833 | 0.058032 | 0.047819 | -0.136863 | -0.123220 | -0.100157 | -0.037805 | 1.000000 | -0.082746 | -0.049680 | -0.047581 | 0.005658 | 0.062572 | 0.067316 | 0.009598 | 0.008458 | 0.036335 | 0.033135 | 0.076424 | -0.095739 | -0.006824 | 0.027596 | 0.032986 | 0.014676 | -0.022046 | 0.038055 | -0.004231 | 0.005875 | -0.047599 | -0.002662 | 0.034867 | -0.008782 | -0.002629 | 0.011455 | -0.007329 | -0.015677 | 0.115265 | 0.055730 | 0.010871 | 0.009326 | -0.000358 | 0.121010 | 0.077597 | 0.004353 |
| MED_SPEC_NUM | -0.030777 | 0.016623 | -0.068202 | 0.004630 | 0.185351 | -0.024028 | -0.152760 | 0.023146 | -0.082746 | 1.000000 | -0.068863 | 0.076952 | 0.036943 | -0.051445 | -0.009879 | -0.013909 | 0.018820 | -0.019354 | -0.015192 | -0.176693 | -0.003316 | -0.009813 | 0.023068 | 0.010220 | 0.006590 | 0.002161 | 0.012798 | -0.002891 | 0.007273 | -0.005929 | -0.001944 | 0.002210 | 0.016639 | -0.005808 | 0.002816 | -0.002660 | -0.004689 | -0.014342 | 0.000051 | -0.006234 | -0.002891 | 0.010115 | -0.005111 | -0.002299 | -0.044800 |
| num_lab_procedures | -0.023193 | -0.004968 | 0.025665 | 0.090456 | -0.145869 | 0.022906 | 0.046823 | 0.318234 | -0.049680 | -0.068863 | 1.000000 | 0.055081 | 0.267707 | -0.008437 | 0.000613 | 0.037763 | -0.071046 | 0.011204 | 0.011021 | 0.149116 | -0.124907 | 0.236383 | -0.044042 | 0.010438 | -0.008292 | -0.005659 | 0.005344 | 0.005344 | 0.012450 | -0.001768 | -0.001320 | -0.015599 | -0.010260 | -0.000654 | -0.002963 | 0.005036 | 0.000008 | 0.085401 | -0.010852 | -0.006685 | 0.001689 | -0.004330 | 0.062801 | 0.030903 | 0.035997 |
| num_procedures | 0.024391 | 0.061668 | -0.028360 | 0.018693 | 0.131923 | 0.015536 | -0.137044 | 0.193139 | -0.047581 | 0.076952 | 0.055081 | 1.000000 | 0.387685 | -0.028257 | -0.033659 | -0.061114 | -0.056866 | 0.036607 | 0.025920 | 0.074394 | -0.069910 | -0.017477 | -0.038122 | 0.005662 | -0.002359 | 0.004757 | 0.007223 | 0.006615 | 0.004999 | 0.001531 | -0.003423 | 0.016471 | 0.018742 | -0.000362 | -0.001521 | -0.005745 | 0.005154 | 0.015020 | -0.000553 | -0.006640 | -0.003317 | -0.000834 | 0.005976 | -0.009904 | -0.037714 |
| num_medications | 0.022157 | -0.023819 | 0.039010 | 0.011274 | 0.075711 | 0.105415 | -0.055016 | 0.468752 | 0.005658 | 0.036943 | 0.267707 | 0.387685 | 1.000000 | 0.047313 | 0.017129 | 0.066793 | 0.004288 | 0.084268 | 0.063166 | 0.263311 | 0.001639 | 0.013044 | 0.069433 | 0.019283 | 0.023352 | -0.000940 | 0.045223 | 0.009348 | 0.056985 | 0.030886 | 0.002943 | 0.071584 | 0.052860 | 0.017947 | 0.006422 | 0.002992 | -0.002113 | 0.198963 | 0.013382 | 0.002757 | -0.002603 | 0.002074 | 0.248529 | 0.186247 | 0.050711 |
| number_outpatient | 0.050845 | -0.005846 | 0.029064 | 0.104440 | 0.030746 | -0.006101 | 0.028833 | -0.003410 | 0.062572 | -0.051445 | -0.008437 | -0.028257 | 0.047313 | 1.000000 | 0.087824 | 0.103471 | -0.009347 | 0.028015 | 0.026595 | 0.093518 | 0.054949 | -0.024324 | -0.013006 | 0.001026 | 0.002719 | -0.004402 | -0.009039 | -0.001242 | 0.010527 | -0.000482 | 0.000350 | 0.012212 | -0.001550 | 0.009388 | -0.002243 | -0.002152 | -0.005556 | 0.010029 | -0.008428 | 0.003037 | -0.001242 | -0.001242 | 0.027105 | 0.017340 | 0.068145 |
| number_emergency | -0.012812 | -0.024202 | -0.089149 | 0.003706 | -0.018190 | -0.024692 | 0.061938 | -0.005467 | 0.067316 | -0.009879 | 0.000613 | -0.033659 | 0.017129 | 0.087824 | 1.000000 | 0.279626 | -0.023803 | -0.004155 | 0.007427 | 0.059398 | 0.035679 | -0.004270 | -0.009572 | 0.007820 | 0.005489 | -0.004218 | 0.003318 | -0.000907 | -0.003426 | -0.027870 | -0.002870 | -0.001978 | -0.006844 | 0.004224 | -0.000207 | -0.001572 | -0.004059 | 0.048501 | 0.001956 | -0.002723 | -0.000907 | -0.000907 | 0.041797 | 0.029415 | 0.103321 |
| number_inpatient | -0.006053 | -0.013405 | -0.047012 | -0.009154 | -0.032648 | 0.019240 | 0.033697 | 0.079929 | 0.009598 | -0.013909 | 0.037763 | -0.061114 | 0.066793 | 0.103471 | 0.279626 | 1.000000 | -0.004620 | 0.024244 | 0.032150 | 0.102473 | 0.038503 | -0.049379 | -0.073780 | 0.011936 | -0.006284 | -0.008317 | -0.016545 | -0.002118 | -0.022736 | -0.036659 | -0.003545 | -0.026804 | -0.021471 | 0.000411 | -0.003851 | -0.003669 | -0.003526 | 0.060505 | -0.008426 | -0.000813 | 0.001207 | -0.002118 | 0.025420 | 0.025559 | 0.233149 |
| DIAG_CAT_1 | 0.042924 | -0.034311 | 0.091837 | 0.023982 | 0.032151 | 0.034616 | -0.007753 | -0.019913 | 0.008458 | 0.018820 | -0.071046 | -0.056866 | 0.004288 | -0.009347 | -0.023803 | -0.004620 | 1.000000 | 0.025858 | 0.028021 | 0.046451 | -0.016030 | -0.091392 | 0.033199 | 0.002242 | -0.000440 | -0.002017 | 0.000410 | 0.001038 | 0.010541 | 0.017872 | 0.006039 | 0.024890 | 0.010041 | 0.003061 | 0.006030 | 0.000456 | 0.000745 | -0.075260 | 0.015281 | 0.004664 | -0.003029 | 0.003943 | -0.033688 | -0.028985 | -0.004994 |
| DIAG_CAT_2 | 0.029594 | 0.008083 | 0.077541 | 0.031824 | -0.005648 | 0.029774 | -0.019796 | 0.086503 | 0.036335 | -0.019354 | 0.011204 | 0.036607 | 0.084268 | 0.028015 | -0.004155 | 0.024244 | 0.025858 | 1.000000 | 0.081391 | 0.171521 | -0.017962 | -0.044930 | -0.018313 | 0.003082 | -0.000322 | -0.004128 | 0.006773 | 0.002264 | 0.004223 | 0.010435 | 0.000339 | 0.000030 | -0.010618 | 0.000704 | 0.005705 | 0.001300 | -0.003710 | -0.007776 | -0.007621 | 0.005659 | -0.000006 | -0.005116 | -0.006439 | -0.010210 | 0.011850 |
| DIAG_CAT_3 | 0.016000 | 0.008343 | 0.052021 | 0.014000 | -0.008918 | 0.024778 | 0.001447 | 0.068677 | 0.033135 | -0.015192 | 0.011021 | 0.025920 | 0.063166 | 0.026595 | 0.007427 | 0.032150 | 0.028021 | 0.081391 | 1.000000 | 0.186667 | -0.009693 | -0.031716 | -0.024179 | 0.005636 | 0.003922 | -0.007445 | -0.010677 | 0.000333 | -0.005554 | -0.005157 | -0.002879 | -0.008180 | -0.003303 | 0.000458 | -0.000319 | -0.000620 | -0.003145 | 0.013942 | -0.000101 | 0.006007 | 0.006032 | -0.004330 | 0.005824 | -0.007452 | 0.027877 |
| number_diagnoses | 0.081672 | -0.007818 | 0.243515 | 0.054391 | -0.113991 | 0.049496 | 0.076318 | 0.224265 | 0.076424 | -0.176693 | 0.149116 | 0.074394 | 0.263311 | 0.093518 | 0.059398 | 0.102473 | 0.046451 | 0.171521 | 0.186667 | 1.000000 | -0.036161 | -0.032983 | -0.073736 | 0.033225 | 0.012336 | -0.014080 | 0.013640 | 0.003449 | -0.005975 | -0.024247 | 0.001220 | 0.002278 | -0.011524 | 0.007741 | -0.000293 | 0.004710 | -0.013444 | 0.076730 | -0.005894 | -0.006428 | 0.003449 | -0.007491 | 0.055250 | 0.019375 | 0.103885 |
| max_glu_serum | 0.054576 | -0.001347 | 0.018618 | -0.037139 | 0.352793 | 0.037086 | 0.412356 | 0.029079 | -0.095739 | -0.003316 | -0.124907 | -0.069910 | 0.001639 | 0.054949 | 0.035679 | 0.038503 | -0.016030 | -0.017962 | -0.009693 | -0.036161 | 1.000000 | -0.043540 | -0.029790 | -0.015106 | -0.016794 | 0.008938 | -0.031840 | -0.000902 | 0.005931 | 0.000373 | 0.006437 | -0.014531 | -0.009275 | 0.005479 | -0.004328 | -0.001562 | -0.004032 | 0.000884 | -0.014296 | -0.002705 | -0.000902 | -0.000902 | 0.008958 | -0.005206 | 0.017684 |
| A1Cresult | -0.013318 | 0.016539 | -0.147559 | -0.021109 | -0.043929 | -0.020713 | 0.006512 | 0.058088 | -0.006824 | -0.009813 | 0.236383 | -0.017477 | 0.013044 | -0.024324 | -0.004270 | -0.049379 | -0.091392 | -0.044930 | -0.031716 | -0.032983 | -0.043540 | 1.000000 | 0.051894 | 0.022541 | -0.000669 | -0.003225 | 0.022787 | -0.001747 | 0.020844 | 0.009977 | -0.005526 | 0.000223 | 0.009548 | 0.009374 | 0.007741 | -0.003026 | -0.000390 | 0.107227 | -0.005008 | 0.001082 | -0.001747 | -0.001747 | 0.105614 | 0.086291 | -0.013614 |
| metformin | 0.010548 | 0.001549 | -0.060696 | 0.007304 | 0.008631 | -0.008376 | -0.033283 | -0.009071 | 0.027596 | 0.023068 | -0.044042 | -0.038122 | 0.069433 | -0.013006 | -0.009572 | -0.073780 | 0.033199 | -0.018313 | -0.024179 | -0.073736 | -0.029790 | 0.051894 | 1.000000 | -0.001074 | 0.020372 | -0.011841 | 0.047475 | -0.002068 | 0.077111 | 0.129061 | -0.006539 | 0.060566 | 0.097708 | 0.006246 | 0.005628 | -0.003582 | 0.004664 | -0.017392 | -0.021191 | -0.002748 | 0.008300 | 0.003116 | 0.325302 | 0.267566 | -0.035809 |
| repaglinide | 0.025466 | -0.004777 | 0.045565 | -0.005440 | -0.003481 | -0.002759 | -0.003732 | 0.034985 | 0.032986 | 0.010220 | 0.010438 | 0.005662 | 0.019283 | 0.001026 | 0.007820 | 0.011936 | 0.002242 | 0.003082 | 0.005636 | 0.033225 | -0.015106 | 0.022541 | -0.001074 | 1.000000 | -0.003246 | -0.003466 | -0.007518 | -0.000511 | -0.015927 | -0.024160 | -0.001617 | 0.019393 | 0.009031 | 0.011257 | 0.018066 | -0.000886 | -0.002287 | 0.006058 | -0.004506 | -0.001534 | -0.000511 | -0.000511 | 0.071294 | 0.066174 | 0.014286 |
| nateglinide | -0.004170 | -0.005390 | 0.020363 | 0.010707 | -0.008099 | -0.008790 | -0.019612 | 0.003320 | 0.014676 | 0.006590 | -0.008292 | -0.002359 | 0.023352 | 0.002719 | 0.005489 | -0.006284 | -0.000440 | -0.000322 | 0.003922 | 0.012336 | -0.016794 | -0.000669 | 0.020372 | -0.003246 | 1.000000 | -0.002386 | 0.004488 | -0.000352 | -0.018191 | -0.020817 | -0.001113 | 0.025830 | 0.013947 | -0.004585 | 0.018302 | -0.000610 | -0.001575 | 0.001396 | -0.006775 | -0.001056 | -0.000352 | -0.000352 | 0.052927 | 0.045552 | 0.007164 |
| chlorpropamide | 0.006801 | 0.006481 | 0.012367 | -0.000839 | 0.007875 | 0.018525 | 0.002666 | 0.004094 | -0.022046 | 0.002161 | -0.005659 | 0.004757 | -0.000940 | -0.004402 | -0.004218 | -0.008317 | -0.002017 | -0.004128 | -0.007445 | -0.014080 | 0.008938 | -0.003225 | -0.011841 | -0.003466 | -0.002386 | 1.000000 | -0.006537 | -0.000121 | -0.010745 | -0.005823 | -0.000383 | -0.007959 | -0.000123 | -0.001576 | -0.000581 | -0.000210 | -0.000541 | -0.020008 | -0.002329 | -0.000363 | -0.000121 | -0.000121 | -0.007035 | 0.015661 | -0.002806 |
| glimepiride | 0.008261 | -0.000156 | 0.044360 | 0.013694 | -0.003178 | -0.022360 | -0.026685 | 0.016086 | 0.038055 | 0.012798 | 0.005344 | 0.007223 | 0.045223 | -0.009039 | 0.003318 | -0.016545 | 0.000410 | 0.006773 | -0.010677 | 0.013640 | -0.031840 | 0.022787 | 0.047475 | -0.007518 | 0.004488 | -0.006537 | 1.000000 | -0.000964 | -0.071983 | -0.067334 | -0.003050 | 0.042601 | 0.038655 | 0.018418 | 0.019830 | 0.009191 | -0.004314 | 0.012479 | -0.012202 | -0.002894 | -0.000964 | -0.000964 | 0.138970 | 0.124797 | 0.004760 |
| acetohexamide | 0.001793 | -0.003935 | 0.002400 | -0.000736 | -0.002988 | 0.014597 | 0.001296 | 0.013596 | -0.004231 | -0.002891 | 0.005344 | 0.006615 | 0.009348 | -0.001242 | -0.000907 | -0.002118 | 0.001038 | 0.002264 | 0.000333 | 0.003449 | -0.000902 | -0.001747 | -0.002068 | -0.000511 | -0.000352 | -0.000121 | -0.000964 | 1.000000 | -0.001585 | -0.001414 | -0.000056 | -0.001174 | -0.001085 | -0.000233 | -0.000086 | -0.000031 | -0.000080 | 0.003607 | -0.000344 | -0.000054 | -0.000018 | -0.000018 | 0.004554 | 0.002311 | 0.002639 |
| glipizide | 0.018551 | 0.026810 | 0.055867 | 0.017062 | 0.007991 | -0.013379 | 0.009300 | 0.016737 | 0.005875 | 0.007273 | 0.012450 | 0.004999 | 0.056985 | 0.010527 | -0.003426 | -0.022736 | 0.010541 | 0.004223 | -0.005554 | -0.005975 | 0.005931 | 0.020844 | 0.077111 | -0.015927 | -0.018191 | -0.010745 | -0.071983 | -0.001585 | 1.000000 | -0.104495 | -0.005014 | 0.049752 | 0.041498 | 0.030598 | 0.002971 | -0.002746 | -0.001524 | -0.027179 | -0.027923 | -0.000607 | -0.001585 | -0.001585 | 0.194260 | 0.205145 | 0.014766 |
| glyburide | 0.015784 | 0.034631 | 0.076798 | 0.008707 | -0.002804 | 0.048256 | 0.004919 | 0.023482 | -0.047599 | -0.005929 | -0.001768 | 0.001531 | 0.030886 | -0.000482 | -0.027870 | -0.036659 | 0.017872 | 0.010435 | -0.005157 | -0.024247 | 0.000373 | 0.009977 | 0.129061 | -0.024160 | -0.020817 | -0.005823 | -0.067334 | -0.001414 | -0.104495 | 1.000000 | -0.004473 | 0.027727 | 0.030766 | 0.015094 | -0.000056 | -0.002450 | -0.006327 | -0.071853 | -0.006909 | 0.000245 | -0.001414 | -0.001414 | 0.172392 | 0.183024 | -0.004492 |
| tolbutamide | -0.001455 | -0.001727 | 0.010110 | -0.002326 | 0.006347 | 0.003228 | 0.001791 | 0.001799 | -0.002662 | -0.001944 | -0.001320 | -0.003423 | 0.002943 | 0.000350 | -0.002870 | -0.003545 | 0.006039 | 0.000339 | -0.002879 | 0.001220 | 0.006437 | -0.005526 | -0.006539 | -0.001617 | -0.001113 | -0.000383 | -0.003050 | -0.000056 | -0.005014 | -0.004473 | 1.000000 | -0.003714 | -0.003432 | -0.000736 | -0.000271 | -0.000098 | -0.000253 | -0.001925 | -0.001087 | -0.000169 | -0.000056 | -0.000056 | 0.001000 | 0.007308 | -0.007263 |
| pioglitazone | 0.026105 | 0.002339 | 0.013860 | 0.026059 | 0.018570 | -0.014116 | -0.005729 | 0.008521 | 0.034867 | 0.002210 | -0.015599 | 0.016471 | 0.071584 | 0.012212 | -0.001978 | -0.026804 | 0.024890 | 0.000030 | -0.008180 | 0.002278 | -0.014531 | 0.000223 | 0.060566 | 0.019393 | 0.025830 | -0.007959 | 0.042601 | -0.001174 | 0.049752 | 0.027727 | -0.003714 | 1.000000 | -0.062763 | 0.015377 | 0.000791 | -0.002034 | -0.001659 | 0.003954 | 0.022117 | 0.007190 | -0.001174 | 0.014894 | 0.203180 | 0.151949 | 0.011002 |
| rosiglitazone | 0.005938 | 0.010843 | 0.003034 | 0.004232 | 0.022930 | -0.001694 | -0.008894 | 0.008531 | -0.008782 | 0.016639 | -0.010260 | 0.018742 | 0.052860 | -0.001550 | -0.006844 | -0.021471 | 0.010041 | -0.010618 | -0.003303 | -0.011524 | -0.009275 | 0.009548 | 0.097708 | 0.009031 | 0.013947 | -0.000123 | 0.038655 | -0.001085 | 0.041498 | 0.030766 | -0.003432 | -0.062763 | 1.000000 | 0.002006 | 0.003416 | 0.008079 | -0.000996 | 0.004080 | 0.003340 | -0.003256 | -0.001085 | -0.001085 | 0.191641 | 0.140410 | 0.005522 |
| acarbose | 0.013237 | 0.010581 | 0.008092 | 0.010411 | 0.006061 | 0.006779 | -0.000753 | 0.007231 | -0.002629 | -0.005808 | -0.000654 | -0.000362 | 0.017947 | 0.009388 | 0.004224 | 0.000411 | 0.003061 | 0.000704 | 0.000458 | 0.007741 | 0.005479 | 0.009374 | 0.006246 | 0.011257 | -0.004585 | -0.001576 | 0.018418 | -0.000233 | 0.030598 | 0.015094 | -0.000736 | 0.015377 | 0.002006 | 1.000000 | -0.001117 | -0.000403 | -0.001040 | -0.001790 | 0.013046 | -0.000698 | -0.000233 | -0.000233 | 0.047261 | 0.030097 | 0.007816 |
| miglitol | -0.001307 | 0.009920 | 0.011788 | -0.003532 | -0.001414 | 0.005779 | -0.000763 | 0.005083 | 0.011455 | 0.002816 | -0.002963 | -0.001521 | 0.006422 | -0.002243 | -0.000207 | -0.003851 | 0.006030 | 0.005705 | -0.000319 | -0.000293 | -0.004328 | 0.007741 | 0.005628 | 0.018066 | 0.018302 | -0.000581 | 0.019830 | -0.000086 | 0.002971 | -0.000056 | -0.000271 | 0.000791 | 0.003416 | -0.001117 | 1.000000 | -0.000148 | -0.000383 | 0.000451 | -0.001650 | -0.000257 | -0.000086 | -0.000086 | 0.018472 | 0.011094 | 0.003413 |
| troglitazone | 0.003106 | 0.007860 | -0.001978 | -0.001274 | 0.003307 | 0.008684 | 0.002245 | 0.004746 | -0.007329 | -0.002660 | 0.005036 | -0.005745 | 0.002992 | -0.002152 | -0.001572 | -0.003669 | 0.000456 | 0.001300 | -0.000620 | 0.004710 | -0.001562 | -0.003026 | -0.003582 | -0.000886 | -0.000610 | -0.000210 | 0.009191 | -0.000031 | -0.002746 | -0.002450 | -0.000098 | -0.002034 | 0.008079 | -0.000403 | -0.000148 | 1.000000 | -0.000138 | -0.000391 | -0.000595 | -0.000093 | -0.000031 | -0.000031 | 0.007888 | 0.004002 | 0.001009 |
| tolazamide | 0.003990 | 0.003242 | 0.003605 | 0.000692 | 0.010291 | 0.013139 | 0.001834 | 0.000328 | -0.015677 | -0.004689 | 0.000008 | 0.005154 | -0.002113 | -0.005556 | -0.004059 | -0.003526 | 0.000745 | -0.003710 | -0.003145 | -0.013444 | -0.004032 | -0.000390 | 0.004664 | -0.002287 | -0.001575 | -0.000541 | -0.004314 | -0.000080 | -0.001524 | -0.006327 | -0.000253 | -0.001659 | -0.000996 | -0.001040 | -0.000383 | -0.000138 | 1.000000 | -0.013867 | -0.001537 | -0.000240 | -0.000080 | -0.000080 | -0.002376 | 0.010336 | -0.007513 |
| insulin | -0.039862 | 0.000247 | -0.079078 | -0.076697 | -0.025368 | -0.041842 | 0.005094 | 0.101223 | 0.115265 | -0.014342 | 0.085401 | 0.015020 | 0.198963 | 0.010029 | 0.048501 | 0.060505 | -0.075260 | -0.007776 | 0.013942 | 0.076730 | 0.000884 | 0.107227 | -0.017392 | 0.006058 | 0.001396 | -0.020008 | 0.012479 | 0.003607 | -0.027179 | -0.071853 | -0.001925 | 0.003954 | 0.004080 | -0.001790 | 0.000451 | -0.000391 | -0.013867 | 1.000000 | 0.005828 | -0.000677 | 0.003607 | 0.003607 | 0.461502 | 0.525169 | 0.040750 |
| glyburide.metformin | 0.006384 | 0.002489 | -0.002451 | -0.014159 | -0.000573 | -0.002994 | -0.024616 | -0.006358 | 0.055730 | 0.000051 | -0.010852 | -0.000553 | 0.013382 | -0.008428 | 0.001956 | -0.008426 | 0.015281 | -0.007621 | -0.000101 | -0.005894 | -0.014296 | -0.005008 | -0.021191 | -0.004506 | -0.006775 | -0.002329 | -0.012202 | -0.000344 | -0.027923 | -0.006909 | -0.001087 | 0.022117 | 0.003340 | 0.013046 | -0.001650 | -0.000595 | -0.001537 | 0.005828 | 1.000000 | 0.050992 | -0.000344 | -0.000344 | 0.038712 | 0.044474 | -0.001842 |
| glipizide.metformin | 0.005380 | 0.007965 | 0.003658 | -0.002207 | -0.005046 | 0.000933 | -0.000281 | -0.001692 | 0.010871 | -0.006234 | -0.006685 | -0.006640 | 0.002757 | 0.003037 | -0.002723 | -0.000813 | 0.004664 | 0.005659 | 0.006007 | -0.006428 | -0.002705 | 0.001082 | -0.002748 | -0.001534 | -0.001056 | -0.000363 | -0.002894 | -0.000054 | -0.000607 | 0.000245 | -0.000169 | 0.007190 | -0.003256 | -0.000698 | -0.000257 | -0.000093 | -0.000240 | -0.000677 | 0.050992 | 1.000000 | -0.000054 | -0.000054 | 0.010838 | 0.006933 | 0.001747 |
| metformin.rosiglitazone | -0.011726 | 0.004538 | 0.002400 | -0.000736 | -0.002988 | -0.002174 | 0.001296 | -0.003396 | 0.009326 | -0.002891 | 0.001689 | -0.003317 | -0.002603 | -0.001242 | -0.000907 | 0.001207 | -0.003029 | -0.000006 | 0.006032 | 0.003449 | -0.000902 | -0.001747 | 0.008300 | -0.000511 | -0.000352 | -0.000121 | -0.000964 | -0.000018 | -0.001585 | -0.001414 | -0.000056 | -0.001174 | -0.001085 | -0.000233 | -0.000086 | -0.000031 | -0.000080 | 0.003607 | -0.000344 | -0.000054 | 1.000000 | -0.000018 | 0.004554 | 0.002311 | -0.003530 |
| metformin.pioglitazone | 0.001793 | -0.003935 | -0.000257 | -0.000736 | 0.002888 | -0.000576 | -0.004958 | 0.002268 | -0.000358 | 0.010115 | -0.004330 | -0.000834 | 0.002074 | -0.001242 | -0.000907 | -0.002118 | 0.003943 | -0.005116 | -0.004330 | -0.007491 | -0.000902 | -0.001747 | 0.003116 | -0.000511 | -0.000352 | -0.000121 | -0.000964 | -0.000018 | -0.001585 | -0.001414 | -0.000056 | 0.014894 | -0.001085 | -0.000233 | -0.000086 | -0.000031 | -0.000080 | 0.003607 | -0.000344 | -0.000054 | -0.000018 | 1.000000 | 0.004554 | 0.002311 | -0.003530 |
| change | 0.008300 | 0.012476 | -0.037793 | -0.041219 | 0.003992 | -0.014047 | 0.002583 | 0.112359 | 0.121010 | -0.005111 | 0.062801 | 0.005976 | 0.248529 | 0.027105 | 0.041797 | 0.025420 | -0.033688 | -0.006439 | 0.005824 | 0.055250 | 0.008958 | 0.105614 | 0.325302 | 0.071294 | 0.052927 | -0.007035 | 0.138970 | 0.004554 | 0.194260 | 0.172392 | 0.001000 | 0.203180 | 0.191641 | 0.047261 | 0.018472 | 0.007888 | -0.002376 | 0.461502 | 0.038712 | 0.010838 | 0.004554 | 0.004554 | 1.000000 | 0.507411 | 0.046717 |
| diabetesMed | -0.004537 | 0.015391 | -0.025360 | -0.030585 | -0.003930 | -0.029452 | 0.000535 | 0.059464 | 0.077597 | -0.002299 | 0.030903 | -0.009904 | 0.186247 | 0.017340 | 0.029415 | 0.025559 | -0.028985 | -0.010210 | -0.007452 | 0.019375 | -0.005206 | 0.086291 | 0.267566 | 0.066174 | 0.045552 | 0.015661 | 0.124797 | 0.002311 | 0.205145 | 0.183024 | 0.007308 | 0.151949 | 0.140410 | 0.030097 | 0.011094 | 0.004002 | 0.010336 | 0.525169 | 0.044474 | 0.006933 | 0.002311 | 0.002311 | 0.507411 | 1.000000 | 0.058183 |
| readmitted | 0.014912 | -0.013626 | 0.029704 | 0.027236 | -0.008561 | 0.009300 | 0.030377 | 0.057129 | 0.004353 | -0.044800 | 0.035997 | -0.037714 | 0.050711 | 0.068145 | 0.103321 | 0.233149 | -0.004994 | 0.011850 | 0.027877 | 0.103885 | 0.017684 | -0.013614 | -0.035809 | 0.014286 | 0.007164 | -0.002806 | 0.004760 | 0.002639 | 0.014766 | -0.004492 | -0.007263 | 0.011002 | 0.005522 | 0.007816 | 0.003413 | 0.001009 | -0.007513 | 0.040750 | -0.001842 | 0.001747 | -0.003530 | -0.003530 | 0.046717 | 0.058183 | 1.000000 |
Preliminary possibilites correlated with readmitted¶
- number_emergency = 0.103321
- number_inpatient = 0.233149
- number_diagnoses = 0.103885
No change from previous correlation analysis Strongest +'s:
- #ER = 0.103
- #Inpatient = 0.233
- #Diag's = 0.104 Strongest -'s:
- MED_SPEC_NUM: -0.045 --> Which is irrelevant at this point as they are alphabetically sorted
- #Porcedure's: -0.038
- metformin: -0.036
# heatmap for correlation
plt.figure(figsize=(35,36))
sns.heatmap(df.corr(), annot=True)
<matplotlib.axes._subplots.AxesSubplot at 0x37dce6d8>
# describe for a single column
df['readmitted'].describe()
count 56000.000000 mean 0.572268 std 0.685018 min 0.000000 25% 0.000000 50% 0.000000 75% 1.000000 max 2.000000 Name: readmitted, dtype: float64
# how many unique values in the 'readmitted' column
df.groupby('readmitted').size()
readmitted 0 30238 1 19477 2 6285 dtype: int64
# how many missing values in each column or variable
df.isnull().sum()
race 0 gender 0 age 0 weight 0 admission_type_id 0 discharge_disposition_id 0 admission_source_id 0 time_in_hospital 0 payer_code 0 MED_SPEC_NUM 0 num_lab_procedures 0 num_procedures 0 num_medications 0 number_outpatient 0 number_emergency 0 number_inpatient 0 DIAG_CAT_1 0 DIAG_CAT_2 0 DIAG_CAT_3 0 number_diagnoses 0 max_glu_serum 0 A1Cresult 0 metformin 0 repaglinide 0 nateglinide 0 chlorpropamide 0 glimepiride 0 acetohexamide 0 glipizide 0 glyburide 0 tolbutamide 0 pioglitazone 0 rosiglitazone 0 acarbose 0 miglitol 0 troglitazone 0 tolazamide 0 insulin 0 glyburide.metformin 0 glipizide.metformin 0 metformin.rosiglitazone 0 metformin.pioglitazone 0 change 0 diabetesMed 0 readmitted 0 dtype: int64
# pivot table for 'readmitted'
df.groupby(['readmitted']).count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| readmitted | ||||||||||||||||||||||||||||||||||||||||||||
| 0 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 | 30238 |
| 1 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 | 19477 |
| 2 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 |
# pivot table for 'readmitted' showing mean value, not count
df.groupby(['readmitted']).mean()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| readmitted | ||||||||||||||||||||||||||||||||||||||||||||
| 0 | 2.589060 | 0.472088 | 6.050665 | 0.098717 | 2.025994 | 3.818606 | 5.613665 | 4.254415 | 4.325617 | 11.423771 | 42.454957 | 1.404061 | 15.639725 | 0.272042 | 0.105496 | 0.382036 | 14.233018 | 11.918612 | 11.127522 | 7.220914 | 0.085191 | 0.376315 | 0.422912 | 0.026324 | 0.012666 | 0.001654 | 0.099345 | 0.000000 | 0.243336 | 0.211820 | 0.000529 | 0.137245 | 0.119783 | 0.004928 | 0.000628 | 0.000066 | 0.000926 | 1.017792 | 0.013030 | 0.000265 | 0.000066 | 0.000066 | 0.438587 | 0.744857 |
| 1 | 2.614930 | 0.454177 | 6.146121 | 0.164091 | 2.012887 | 3.324383 | 5.958053 | 4.508703 | 4.467834 | 9.700056 | 43.880269 | 1.250655 | 16.344458 | 0.495456 | 0.294039 | 0.845356 | 14.223700 | 12.124352 | 11.623351 | 7.658366 | 0.098475 | 0.367613 | 0.381732 | 0.034091 | 0.016122 | 0.002310 | 0.108230 | 0.000103 | 0.270370 | 0.210197 | 0.000205 | 0.162448 | 0.139241 | 0.008472 | 0.001284 | 0.000205 | 0.000616 | 1.097808 | 0.014376 | 0.000411 | 0.000000 | 0.000000 | 0.491605 | 0.799096 |
| 2 | 2.624821 | 0.459666 | 6.164041 | 0.120923 | 1.985521 | 4.487828 | 5.820366 | 4.747176 | 4.274781 | 10.037232 | 44.156563 | 1.272076 | 16.748449 | 0.428640 | 0.335402 | 1.218457 | 14.086396 | 12.104694 | 11.639300 | 7.672554 | 0.105489 | 0.332538 | 0.336356 | 0.034208 | 0.014638 | 0.000318 | 0.100239 | 0.000000 | 0.261098 | 0.201273 | 0.000000 | 0.138584 | 0.113286 | 0.005410 | 0.000636 | 0.000000 | 0.000000 | 1.135561 | 0.010501 | 0.000318 | 0.000000 | 0.000000 | 0.488942 | 0.799204 |
Mean 'readmitted' Pivot Table Notes:¶
- Doesn't appear to be a factor:
- race
- gender
- admission_type_id / admission
- payer_code
- Slight possiblity of being a factor:
- age
- num_lab_procedures
- num_procedures (interesting that those not returning at all had the highest average number of procedures at 1.404)
- num_medications
- number_diagnoses
- insulin
- change
- diabetesMed
- Appears to be a possible factor:
- weight
- discharge_disposition_id
- time_in_hospital
- number_outpatient
- number_emergency
- number_inpatient
- max_gluc_serum
- A1Cresult (negative factor)
- metformin (neg)
#histograms for all factors
df.hist(figsize=(16,16))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x00000000247A9908>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002A9259B0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x00000000247BCAC8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002A9F3FD0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AA5DF60>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AAB3908>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002A9FEE10>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AA4AB00>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002A918B70>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002A8E2D68>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AB6DF60>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002ABBB710>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002ABF8080>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AC407F0>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AC835C0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002ACD0320>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AD0F080>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AD55AC8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002ADA1278>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002ADDED30>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AE2AD68>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AE65D30>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AEB34E0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AEEF7F0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AF3D080>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AF836D8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002AFC84E0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002B014518>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x000000002B04D4E0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002B093C50>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002B0CDF60>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002B0F1C88>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002B145860>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002B17EC18>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002B1CB3C8>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x000000002B20ADA0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002B2579E8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000002B292860>,
<matplotlib.axes._subplots.AxesSubplot object at 0x00000000436FDFD0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000000043750DA0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000004378F358>,
<matplotlib.axes._subplots.AxesSubplot object at 0x00000000437D79B0>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x0000000043816908>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000000043861940>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000000004389D828>,
<matplotlib.axes._subplots.AxesSubplot object at 0x00000000438E7A58>,
<matplotlib.axes._subplots.AxesSubplot object at 0x00000000438B9860>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000000043977240>,
<matplotlib.axes._subplots.AxesSubplot object at 0x00000000439BD6A0>]], dtype=object)
#create 2nd 'readmitted' column
df['readm2'] = df['readmitted']
df.head(12)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | readm2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 2 | 5 | 0 | 6 | 0 | 1 | 22 | 7 | 4 | 14 | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 24 | 18 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 3 | 3 | 1 | 7 | 0 | 1 | 1 | 7 | 3 | 0 | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 17 | 4 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| 4 | 3 | 0 | 3 | 0 | 2 | 1 | 1 | 3 | 0 | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 23 | 18 | 32 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 3 | 1 | 7 | 0 | 2 | 1 | 1 | 2 | 0 | 18 | 4 | 0 | 7 | 0 | 0 | 0 | 14 | 9 | 3 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 6 | 14 | 33 | 89 | 0 | 25 | 0 | 2 | 1 | 25 | 10 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 7 | 3 | 0 | 7 | 0 | 1 | 6 | 7 | 4 | 6 | 0 | 63 | 0 | 22 | 0 | 2 | 4 | 16 | 3 | 3 | 5 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| 8 | 3 | 1 | 6 | 0 | 1 | 1 | 7 | 6 | 7 | 0 | 45 | 0 | 24 | 0 | 0 | 0 | 16 | 9 | 3 | 7 | 0 | 3 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 9 | 3 | 1 | 8 | 0 | 1 | 1 | 7 | 2 | 3 | 0 | 45 | 0 | 13 | 0 | 0 | 0 | 17 | 3 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 10 | 3 | 1 | 6 | 0 | 2 | 1 | 1 | 3 | 7 | 0 | 57 | 6 | 21 | 0 | 0 | 0 | 12 | 10 | 4 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 11 | 3 | 0 | 4 | 0 | 6 | 1 | 17 | 6 | 0 | 18 | 81 | 0 | 26 | 0 | 0 | 0 | 13 | 21 | 21 | 9 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 2 |
NB: Using 12 rows because the 12th row (row 11) is the first instance of a patient being readmitted < 30 days¶
#replace the values of the 'readm2' column:
# NO = 0 ==> 0 -> NO
# >30 = 1 ==> 1 -> >30
# <30 = 2 ==> 2 -> <30
df = df.replace({'readm2': {0: 'NO', 1: '>30', 2: '<30'}})
df.head(12)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | readm2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | >30 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | NO |
| 2 | 5 | 0 | 6 | 0 | 1 | 22 | 7 | 4 | 14 | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 24 | 18 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | NO |
| 3 | 3 | 1 | 7 | 0 | 1 | 1 | 7 | 3 | 0 | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 17 | 4 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | >30 |
| 4 | 3 | 0 | 3 | 0 | 2 | 1 | 1 | 3 | 0 | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 23 | 18 | 32 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NO |
| 5 | 3 | 1 | 7 | 0 | 2 | 1 | 1 | 2 | 0 | 18 | 4 | 0 | 7 | 0 | 0 | 0 | 14 | 9 | 3 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NO |
| 6 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 6 | 14 | 33 | 89 | 0 | 25 | 0 | 2 | 1 | 25 | 10 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | NO |
| 7 | 3 | 0 | 7 | 0 | 1 | 6 | 7 | 4 | 6 | 0 | 63 | 0 | 22 | 0 | 2 | 4 | 16 | 3 | 3 | 5 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | >30 |
| 8 | 3 | 1 | 6 | 0 | 1 | 1 | 7 | 6 | 7 | 0 | 45 | 0 | 24 | 0 | 0 | 0 | 16 | 9 | 3 | 7 | 0 | 3 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | NO |
| 9 | 3 | 1 | 8 | 0 | 1 | 1 | 7 | 2 | 3 | 0 | 45 | 0 | 13 | 0 | 0 | 0 | 17 | 3 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | NO |
| 10 | 3 | 1 | 6 | 0 | 2 | 1 | 1 | 3 | 7 | 0 | 57 | 6 | 21 | 0 | 0 | 0 | 12 | 10 | 4 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | NO |
| 11 | 3 | 0 | 4 | 0 | 6 | 1 | 17 | 6 | 0 | 18 | 81 | 0 | 26 | 0 | 0 | 0 | 13 | 21 | 21 | 9 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | <30 |
#replace the values of the 'readm2' column:
# NO = 0 ==> 0 -> NO ==> NO -> 0
# >30 = 1 ==> 1 -> >30 ==> >30 -> 0
# <30 = 2 ==> 2 -> <30 ==> <30 -> 1
df = df.replace({'readm2': {'NO': 0, '>30': 0, '<30': 1}})
df.head(12)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readmitted | readm2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 2 | 5 | 0 | 6 | 0 | 1 | 22 | 7 | 4 | 14 | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 24 | 18 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 3 | 3 | 1 | 7 | 0 | 1 | 1 | 7 | 3 | 0 | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 17 | 4 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 4 | 3 | 0 | 3 | 0 | 2 | 1 | 1 | 3 | 0 | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 23 | 18 | 32 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 3 | 1 | 7 | 0 | 2 | 1 | 1 | 2 | 0 | 18 | 4 | 0 | 7 | 0 | 0 | 0 | 14 | 9 | 3 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 6 | 14 | 33 | 89 | 0 | 25 | 0 | 2 | 1 | 25 | 10 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 7 | 3 | 0 | 7 | 0 | 1 | 6 | 7 | 4 | 6 | 0 | 63 | 0 | 22 | 0 | 2 | 4 | 16 | 3 | 3 | 5 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 8 | 3 | 1 | 6 | 0 | 1 | 1 | 7 | 6 | 7 | 0 | 45 | 0 | 24 | 0 | 0 | 0 | 16 | 9 | 3 | 7 | 0 | 3 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 9 | 3 | 1 | 8 | 0 | 1 | 1 | 7 | 2 | 3 | 0 | 45 | 0 | 13 | 0 | 0 | 0 | 17 | 3 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 10 | 3 | 1 | 6 | 0 | 2 | 1 | 1 | 3 | 7 | 0 | 57 | 6 | 21 | 0 | 0 | 0 | 12 | 10 | 4 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 11 | 3 | 0 | 4 | 0 | 6 | 1 | 17 | 6 | 0 | 18 | 81 | 0 | 26 | 0 | 0 | 0 | 13 | 21 | 21 | 9 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 1 |
# drop readmitted - will use readm2 as Y instead
df = df.drop('readmitted', axis=1)
df.head(12)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readm2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 2 | 5 | 0 | 6 | 0 | 1 | 22 | 7 | 4 | 14 | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 24 | 18 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 3 | 3 | 1 | 7 | 0 | 1 | 1 | 7 | 3 | 0 | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 17 | 4 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 4 | 3 | 0 | 3 | 0 | 2 | 1 | 1 | 3 | 0 | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 23 | 18 | 32 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 3 | 1 | 7 | 0 | 2 | 1 | 1 | 2 | 0 | 18 | 4 | 0 | 7 | 0 | 0 | 0 | 14 | 9 | 3 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 6 | 14 | 33 | 89 | 0 | 25 | 0 | 2 | 1 | 25 | 10 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 7 | 3 | 0 | 7 | 0 | 1 | 6 | 7 | 4 | 6 | 0 | 63 | 0 | 22 | 0 | 2 | 4 | 16 | 3 | 3 | 5 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 8 | 3 | 1 | 6 | 0 | 1 | 1 | 7 | 6 | 7 | 0 | 45 | 0 | 24 | 0 | 0 | 0 | 16 | 9 | 3 | 7 | 0 | 3 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 9 | 3 | 1 | 8 | 0 | 1 | 1 | 7 | 2 | 3 | 0 | 45 | 0 | 13 | 0 | 0 | 0 | 17 | 3 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 10 | 3 | 1 | 6 | 0 | 2 | 1 | 1 | 3 | 7 | 0 | 57 | 6 | 21 | 0 | 0 | 0 | 12 | 10 | 4 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 11 | 3 | 0 | 4 | 0 | 6 | 1 | 17 | 6 | 0 | 18 | 81 | 0 | 26 | 0 | 0 | 0 | 13 | 21 | 21 | 9 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
#basic statistics
df.describe()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readm2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 |
| mean | 2.602071 | 0.464464 | 6.096589 | 0.123946 | 2.016893 | 3.721821 | 5.756643 | 4.398161 | 4.369375 | 10.668643 | 43.141661 | 1.335893 | 16.009268 | 0.367321 | 0.196875 | 0.637054 | 14.213321 | 12.011054 | 11.357411 | 7.423750 | 0.092089 | 0.368375 | 0.398875 | 0.029911 | 0.014089 | 0.001732 | 0.102536 | 0.000036 | 0.254732 | 0.210071 | 0.000357 | 0.146161 | 0.125821 | 0.006214 | 0.000857 | 0.000107 | 0.000714 | 1.058839 | 0.013214 | 0.000321 | 0.000036 | 0.000036 | 0.462679 | 0.769821 | 0.112232 |
| std | 0.937754 | 0.498740 | 1.590761 | 0.712004 | 1.438340 | 5.291517 | 4.053838 | 2.984346 | 4.363828 | 15.595799 | 19.656507 | 1.702009 | 8.132455 | 1.249570 | 0.916820 | 1.270768 | 7.272908 | 7.443902 | 8.157131 | 1.931488 | 0.431655 | 0.890972 | 0.815169 | 0.247161 | 0.169132 | 0.060480 | 0.449274 | 0.008452 | 0.678992 | 0.627625 | 0.026724 | 0.525985 | 0.490002 | 0.112904 | 0.042249 | 0.014638 | 0.037790 | 1.102484 | 0.162472 | 0.025353 | 0.008452 | 0.008452 | 0.498610 | 0.420951 | 0.315655 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 3.000000 | 0.000000 | 5.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 2.000000 | 0.000000 | 0.000000 | 32.000000 | 0.000000 | 10.000000 | 0.000000 | 0.000000 | 0.000000 | 10.000000 | 4.000000 | 3.000000 | 6.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 50% | 3.000000 | 0.000000 | 6.000000 | 0.000000 | 1.000000 | 1.000000 | 7.000000 | 4.000000 | 6.000000 | 4.000000 | 44.000000 | 1.000000 | 15.000000 | 0.000000 | 0.000000 | 0.000000 | 15.000000 | 12.000000 | 10.000000 | 8.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 75% | 3.000000 | 1.000000 | 7.000000 | 0.000000 | 3.000000 | 4.000000 | 7.000000 | 6.000000 | 7.000000 | 18.000000 | 57.000000 | 2.000000 | 20.000000 | 0.000000 | 0.000000 | 1.000000 | 18.000000 | 17.000000 | 17.000000 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 0.000000 |
| max | 5.000000 | 1.000000 | 9.000000 | 9.000000 | 8.000000 | 28.000000 | 25.000000 | 14.000000 | 16.000000 | 63.000000 | 132.000000 | 6.000000 | 75.000000 | 42.000000 | 76.000000 | 21.000000 | 32.000000 | 32.000000 | 32.000000 | 16.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 2.000000 | 3.000000 | 3.000000 | 2.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 2.000000 | 2.000000 | 2.000000 | 1.000000 | 1.000000 | 1.000000 |
# describe for a single column
df['readm2'].describe()
count 56000.000000 mean 0.112232 std 0.315655 min 0.000000 25% 0.000000 50% 0.000000 75% 0.000000 max 1.000000 Name: readm2, dtype: float64
# how many unique values in the 'readm2' column
df.groupby('readm2').size()
readm2 0 49715 1 6285 dtype: int64
# pivot talbe for 'readm2'
df.groupby(['readm2']).count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| readm2 | ||||||||||||||||||||||||||||||||||||||||||||
| 0 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 | 49715 |
| 1 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 | 6285 |
# pivot table for 'pep' showing mean value, not count
df.groupby(['readm2']).mean()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| readm2 | ||||||||||||||||||||||||||||||||||||||||||||
| 0 | 2.599195 | 0.465071 | 6.088062 | 0.124329 | 2.020859 | 3.624982 | 5.748587 | 4.354038 | 4.381334 | 10.748466 | 43.013356 | 1.343961 | 15.915820 | 0.35957 | 0.179362 | 0.563552 | 14.229367 | 11.999216 | 11.321774 | 7.392296 | 0.090395 | 0.372906 | 0.406779 | 0.029367 | 0.014020 | 0.001911 | 0.102826 | 0.00004 | 0.253927 | 0.211184 | 0.000402 | 0.147119 | 0.127406 | 0.006316 | 0.000885 | 0.000121 | 0.000805 | 1.049140 | 0.013557 | 0.000322 | 0.00004 | 0.00004 | 0.459358 | 0.766107 |
| 1 | 2.624821 | 0.459666 | 6.164041 | 0.120923 | 1.985521 | 4.487828 | 5.820366 | 4.747176 | 4.274781 | 10.037232 | 44.156563 | 1.272076 | 16.748449 | 0.42864 | 0.335402 | 1.218457 | 14.086396 | 12.104694 | 11.639300 | 7.672554 | 0.105489 | 0.332538 | 0.336356 | 0.034208 | 0.014638 | 0.000318 | 0.100239 | 0.00000 | 0.261098 | 0.201273 | 0.000000 | 0.138584 | 0.113286 | 0.005410 | 0.000636 | 0.000000 | 0.000000 | 1.135561 | 0.010501 | 0.000318 | 0.00000 | 0.00000 | 0.488942 | 0.799204 |
#regression correlation
df.corr()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | readm2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| race | 1.000000 | 0.061706 | 0.114255 | 0.040520 | 0.096587 | 0.005805 | 0.033113 | -0.020364 | 0.041640 | -0.030777 | -0.023193 | 0.024391 | 0.022157 | 0.050845 | -0.012812 | -0.006053 | 0.042924 | 0.029594 | 0.016000 | 0.081672 | 0.054576 | -0.013318 | 0.010548 | 0.025466 | -0.004170 | 0.006801 | 0.008261 | 0.001793 | 0.018551 | 0.015784 | -0.001455 | 0.026105 | 0.005938 | 0.013237 | -0.001307 | 0.003106 | 0.003990 | -0.039862 | 0.006384 | 0.005380 | -0.011726 | 0.001793 | 0.008300 | -0.004537 | 0.008626 |
| gender | 0.061706 | 1.000000 | -0.048579 | 0.014491 | 0.014578 | -0.019566 | -0.005222 | -0.031088 | 0.000833 | 0.016623 | -0.004968 | 0.061668 | -0.023819 | -0.005846 | -0.024202 | -0.013405 | -0.034311 | 0.008083 | 0.008343 | -0.007818 | -0.001347 | 0.016539 | 0.001549 | -0.004777 | -0.005390 | 0.006481 | -0.000156 | -0.003935 | 0.026810 | 0.034631 | -0.001727 | 0.002339 | 0.010843 | 0.010581 | 0.009920 | 0.007860 | 0.003242 | 0.000247 | 0.002489 | 0.007965 | 0.004538 | -0.003935 | 0.012476 | 0.015391 | -0.003421 |
| age | 0.114255 | -0.048579 | 1.000000 | 0.005716 | -0.005747 | 0.113970 | 0.041070 | 0.107273 | 0.058032 | -0.068202 | 0.025665 | -0.028360 | 0.039010 | 0.029064 | -0.089149 | -0.047012 | 0.091837 | 0.077541 | 0.052021 | 0.243515 | 0.018618 | -0.147559 | -0.060696 | 0.045565 | 0.020363 | 0.012367 | 0.044360 | 0.002400 | 0.055867 | 0.076798 | 0.010110 | 0.013860 | 0.003034 | 0.008092 | 0.011788 | -0.001978 | 0.003605 | -0.079078 | -0.002451 | 0.003658 | 0.002400 | -0.000257 | -0.037793 | -0.025360 | 0.015077 |
| weight | 0.040520 | 0.014491 | 0.005716 | 1.000000 | 0.037503 | -0.035383 | 0.003026 | 0.023652 | 0.047819 | 0.004630 | 0.090456 | 0.018693 | 0.011274 | 0.104440 | 0.003706 | -0.009154 | 0.023982 | 0.031824 | 0.014000 | 0.054391 | -0.037139 | -0.021109 | 0.007304 | -0.005440 | 0.010707 | -0.000839 | 0.013694 | -0.000736 | 0.017062 | 0.008707 | -0.002326 | 0.026059 | 0.004232 | 0.010411 | -0.003532 | -0.001274 | 0.000692 | -0.076697 | -0.014159 | -0.002207 | -0.000736 | -0.000736 | -0.041219 | -0.030585 | -0.001510 |
| admission_type_id | 0.096587 | 0.014578 | -0.005747 | 0.037503 | 1.000000 | 0.085986 | 0.098007 | -0.014285 | -0.136863 | 0.185351 | -0.145869 | 0.131923 | 0.075711 | 0.030746 | -0.018190 | -0.032648 | 0.032151 | -0.005648 | -0.008918 | -0.113991 | 0.352793 | -0.043929 | 0.008631 | -0.003481 | -0.008099 | 0.007875 | -0.003178 | -0.002988 | 0.007991 | -0.002804 | 0.006347 | 0.018570 | 0.022930 | 0.006061 | -0.001414 | 0.003307 | 0.010291 | -0.025368 | -0.000573 | -0.005046 | -0.002988 | 0.002888 | 0.003992 | -0.003930 | -0.007755 |
| discharge_disposition_id | 0.005805 | -0.019566 | 0.113970 | -0.035383 | 0.085986 | 1.000000 | 0.016614 | 0.161954 | -0.123220 | -0.024028 | 0.022906 | 0.015536 | 0.105415 | -0.006101 | -0.024692 | 0.019240 | 0.034616 | 0.029774 | 0.024778 | 0.049496 | 0.037086 | -0.020713 | -0.008376 | -0.002759 | -0.008790 | 0.018525 | -0.022360 | 0.014597 | -0.013379 | 0.048256 | 0.003228 | -0.014116 | -0.001694 | 0.006779 | 0.005779 | 0.008684 | 0.013139 | -0.041842 | -0.002994 | 0.000933 | -0.002174 | -0.000576 | -0.014047 | -0.029452 | 0.051471 |
| admission_source_id | 0.033113 | -0.005222 | 0.041070 | 0.003026 | 0.098007 | 0.016614 | 1.000000 | -0.006996 | -0.100157 | -0.152760 | 0.046823 | -0.137044 | -0.055016 | 0.028833 | 0.061938 | 0.033697 | -0.007753 | -0.019796 | 0.001447 | 0.076318 | 0.412356 | 0.006512 | -0.033283 | -0.003732 | -0.019612 | 0.002666 | -0.026685 | 0.001296 | 0.009300 | 0.004919 | 0.001791 | -0.005729 | -0.008894 | -0.000753 | -0.000763 | 0.002245 | 0.001834 | 0.005094 | -0.024616 | -0.000281 | 0.001296 | -0.004958 | 0.002583 | 0.000535 | 0.005589 |
| time_in_hospital | -0.020364 | -0.031088 | 0.107273 | 0.023652 | -0.014285 | 0.161954 | -0.006996 | 1.000000 | -0.037805 | 0.023146 | 0.318234 | 0.193139 | 0.468752 | -0.003410 | -0.005467 | 0.079929 | -0.019913 | 0.086503 | 0.068677 | 0.224265 | 0.029079 | 0.058088 | -0.009071 | 0.034985 | 0.003320 | 0.004094 | 0.016086 | 0.013596 | 0.016737 | 0.023482 | 0.001799 | 0.008521 | 0.008531 | 0.007231 | 0.005083 | 0.004746 | 0.000328 | 0.101223 | -0.006358 | -0.001692 | -0.003396 | 0.002268 | 0.112359 | 0.059464 | 0.041582 |
| payer_code | 0.041640 | 0.000833 | 0.058032 | 0.047819 | -0.136863 | -0.123220 | -0.100157 | -0.037805 | 1.000000 | -0.082746 | -0.049680 | -0.047581 | 0.005658 | 0.062572 | 0.067316 | 0.009598 | 0.008458 | 0.036335 | 0.033135 | 0.076424 | -0.095739 | -0.006824 | 0.027596 | 0.032986 | 0.014676 | -0.022046 | 0.038055 | -0.004231 | 0.005875 | -0.047599 | -0.002662 | 0.034867 | -0.008782 | -0.002629 | 0.011455 | -0.007329 | -0.015677 | 0.115265 | 0.055730 | 0.010871 | 0.009326 | -0.000358 | 0.121010 | 0.077597 | -0.007707 |
| MED_SPEC_NUM | -0.030777 | 0.016623 | -0.068202 | 0.004630 | 0.185351 | -0.024028 | -0.152760 | 0.023146 | -0.082746 | 1.000000 | -0.068863 | 0.076952 | 0.036943 | -0.051445 | -0.009879 | -0.013909 | 0.018820 | -0.019354 | -0.015192 | -0.176693 | -0.003316 | -0.009813 | 0.023068 | 0.010220 | 0.006590 | 0.002161 | 0.012798 | -0.002891 | 0.007273 | -0.005929 | -0.001944 | 0.002210 | 0.016639 | -0.005808 | 0.002816 | -0.002660 | -0.004689 | -0.014342 | 0.000051 | -0.006234 | -0.002891 | 0.010115 | -0.005111 | -0.002299 | -0.014395 |
| num_lab_procedures | -0.023193 | -0.004968 | 0.025665 | 0.090456 | -0.145869 | 0.022906 | 0.046823 | 0.318234 | -0.049680 | -0.068863 | 1.000000 | 0.055081 | 0.267707 | -0.008437 | 0.000613 | 0.037763 | -0.071046 | 0.011204 | 0.011021 | 0.149116 | -0.124907 | 0.236383 | -0.044042 | 0.010438 | -0.008292 | -0.005659 | 0.005344 | 0.005344 | 0.012450 | -0.001768 | -0.001320 | -0.015599 | -0.010260 | -0.000654 | -0.002963 | 0.005036 | 0.000008 | 0.085401 | -0.010852 | -0.006685 | 0.001689 | -0.004330 | 0.062801 | 0.030903 | 0.018358 |
| num_procedures | 0.024391 | 0.061668 | -0.028360 | 0.018693 | 0.131923 | 0.015536 | -0.137044 | 0.193139 | -0.047581 | 0.076952 | 0.055081 | 1.000000 | 0.387685 | -0.028257 | -0.033659 | -0.061114 | -0.056866 | 0.036607 | 0.025920 | 0.074394 | -0.069910 | -0.017477 | -0.038122 | 0.005662 | -0.002359 | 0.004757 | 0.007223 | 0.006615 | 0.004999 | 0.001531 | -0.003423 | 0.016471 | 0.018742 | -0.000362 | -0.001521 | -0.005745 | 0.005154 | 0.015020 | -0.000553 | -0.006640 | -0.003317 | -0.000834 | 0.005976 | -0.009904 | -0.013332 |
| num_medications | 0.022157 | -0.023819 | 0.039010 | 0.011274 | 0.075711 | 0.105415 | -0.055016 | 0.468752 | 0.005658 | 0.036943 | 0.267707 | 0.387685 | 1.000000 | 0.047313 | 0.017129 | 0.066793 | 0.004288 | 0.084268 | 0.063166 | 0.263311 | 0.001639 | 0.013044 | 0.069433 | 0.019283 | 0.023352 | -0.000940 | 0.045223 | 0.009348 | 0.056985 | 0.030886 | 0.002943 | 0.071584 | 0.052860 | 0.017947 | 0.006422 | 0.002992 | -0.002113 | 0.198963 | 0.013382 | 0.002757 | -0.002603 | 0.002074 | 0.248529 | 0.186247 | 0.032318 |
| number_outpatient | 0.050845 | -0.005846 | 0.029064 | 0.104440 | 0.030746 | -0.006101 | 0.028833 | -0.003410 | 0.062572 | -0.051445 | -0.008437 | -0.028257 | 0.047313 | 1.000000 | 0.087824 | 0.103471 | -0.009347 | 0.028015 | 0.026595 | 0.093518 | 0.054949 | -0.024324 | -0.013006 | 0.001026 | 0.002719 | -0.004402 | -0.009039 | -0.001242 | 0.010527 | -0.000482 | 0.000350 | 0.012212 | -0.001550 | 0.009388 | -0.002243 | -0.002152 | -0.005556 | 0.010029 | -0.008428 | 0.003037 | -0.001242 | -0.001242 | 0.027105 | 0.017340 | 0.017448 |
| number_emergency | -0.012812 | -0.024202 | -0.089149 | 0.003706 | -0.018190 | -0.024692 | 0.061938 | -0.005467 | 0.067316 | -0.009879 | 0.000613 | -0.033659 | 0.017129 | 0.087824 | 1.000000 | 0.279626 | -0.023803 | -0.004155 | 0.007427 | 0.059398 | 0.035679 | -0.004270 | -0.009572 | 0.007820 | 0.005489 | -0.004218 | 0.003318 | -0.000907 | -0.003426 | -0.027870 | -0.002870 | -0.001978 | -0.006844 | 0.004224 | -0.000207 | -0.001572 | -0.004059 | 0.048501 | 0.001956 | -0.002723 | -0.000907 | -0.000907 | 0.041797 | 0.029415 | 0.053723 |
| number_inpatient | -0.006053 | -0.013405 | -0.047012 | -0.009154 | -0.032648 | 0.019240 | 0.033697 | 0.079929 | 0.009598 | -0.013909 | 0.037763 | -0.061114 | 0.066793 | 0.103471 | 0.279626 | 1.000000 | -0.004620 | 0.024244 | 0.032150 | 0.102473 | 0.038503 | -0.049379 | -0.073780 | 0.011936 | -0.006284 | -0.008317 | -0.016545 | -0.002118 | -0.022736 | -0.036659 | -0.003545 | -0.026804 | -0.021471 | 0.000411 | -0.003851 | -0.003669 | -0.003526 | 0.060505 | -0.008426 | -0.000813 | 0.001207 | -0.002118 | 0.025420 | 0.025559 | 0.162676 |
| DIAG_CAT_1 | 0.042924 | -0.034311 | 0.091837 | 0.023982 | 0.032151 | 0.034616 | -0.007753 | -0.019913 | 0.008458 | 0.018820 | -0.071046 | -0.056866 | 0.004288 | -0.009347 | -0.023803 | -0.004620 | 1.000000 | 0.025858 | 0.028021 | 0.046451 | -0.016030 | -0.091392 | 0.033199 | 0.002242 | -0.000440 | -0.002017 | 0.000410 | 0.001038 | 0.010541 | 0.017872 | 0.006039 | 0.024890 | 0.010041 | 0.003061 | 0.006030 | 0.000456 | 0.000745 | -0.075260 | 0.015281 | 0.004664 | -0.003029 | 0.003943 | -0.033688 | -0.028985 | -0.006205 |
| DIAG_CAT_2 | 0.029594 | 0.008083 | 0.077541 | 0.031824 | -0.005648 | 0.029774 | -0.019796 | 0.086503 | 0.036335 | -0.019354 | 0.011204 | 0.036607 | 0.084268 | 0.028015 | -0.004155 | 0.024244 | 0.025858 | 1.000000 | 0.081391 | 0.171521 | -0.017962 | -0.044930 | -0.018313 | 0.003082 | -0.000322 | -0.004128 | 0.006773 | 0.002264 | 0.004223 | 0.010435 | 0.000339 | 0.000030 | -0.010618 | 0.000704 | 0.005705 | 0.001300 | -0.003710 | -0.007776 | -0.007621 | 0.005659 | -0.000006 | -0.005116 | -0.006439 | -0.010210 | 0.004473 |
| DIAG_CAT_3 | 0.016000 | 0.008343 | 0.052021 | 0.014000 | -0.008918 | 0.024778 | 0.001447 | 0.068677 | 0.033135 | -0.015192 | 0.011021 | 0.025920 | 0.063166 | 0.026595 | 0.007427 | 0.032150 | 0.028021 | 0.081391 | 1.000000 | 0.186667 | -0.009693 | -0.031716 | -0.024179 | 0.005636 | 0.003922 | -0.007445 | -0.010677 | 0.000333 | -0.005554 | -0.005157 | -0.002879 | -0.008180 | -0.003303 | 0.000458 | -0.000319 | -0.000620 | -0.003145 | 0.013942 | -0.000101 | 0.006007 | 0.006032 | -0.004330 | 0.005824 | -0.007452 | 0.012287 |
| number_diagnoses | 0.081672 | -0.007818 | 0.243515 | 0.054391 | -0.113991 | 0.049496 | 0.076318 | 0.224265 | 0.076424 | -0.176693 | 0.149116 | 0.074394 | 0.263311 | 0.093518 | 0.059398 | 0.102473 | 0.046451 | 0.171521 | 0.186667 | 1.000000 | -0.036161 | -0.032983 | -0.073736 | 0.033225 | 0.012336 | -0.014080 | 0.013640 | 0.003449 | -0.005975 | -0.024247 | 0.001220 | 0.002278 | -0.011524 | 0.007741 | -0.000293 | 0.004710 | -0.013444 | 0.076730 | -0.005894 | -0.006428 | 0.003449 | -0.007491 | 0.055250 | 0.019375 | 0.045801 |
| max_glu_serum | 0.054576 | -0.001347 | 0.018618 | -0.037139 | 0.352793 | 0.037086 | 0.412356 | 0.029079 | -0.095739 | -0.003316 | -0.124907 | -0.069910 | 0.001639 | 0.054949 | 0.035679 | 0.038503 | -0.016030 | -0.017962 | -0.009693 | -0.036161 | 1.000000 | -0.043540 | -0.029790 | -0.015106 | -0.016794 | 0.008938 | -0.031840 | -0.000902 | 0.005931 | 0.000373 | 0.006437 | -0.014531 | -0.009275 | 0.005479 | -0.004328 | -0.001562 | -0.004032 | 0.000884 | -0.014296 | -0.002705 | -0.000902 | -0.000902 | 0.008958 | -0.005206 | 0.011038 |
| A1Cresult | -0.013318 | 0.016539 | -0.147559 | -0.021109 | -0.043929 | -0.020713 | 0.006512 | 0.058088 | -0.006824 | -0.009813 | 0.236383 | -0.017477 | 0.013044 | -0.024324 | -0.004270 | -0.049379 | -0.091392 | -0.044930 | -0.031716 | -0.032983 | -0.043540 | 1.000000 | 0.051894 | 0.022541 | -0.000669 | -0.003225 | 0.022787 | -0.001747 | 0.020844 | 0.009977 | -0.005526 | 0.000223 | 0.009548 | 0.009374 | 0.007741 | -0.003026 | -0.000390 | 0.107227 | -0.005008 | 0.001082 | -0.001747 | -0.001747 | 0.105614 | 0.086291 | -0.014302 |
| metformin | 0.010548 | 0.001549 | -0.060696 | 0.007304 | 0.008631 | -0.008376 | -0.033283 | -0.009071 | 0.027596 | 0.023068 | -0.044042 | -0.038122 | 0.069433 | -0.013006 | -0.009572 | -0.073780 | 0.033199 | -0.018313 | -0.024179 | -0.073736 | -0.029790 | 0.051894 | 1.000000 | -0.001074 | 0.020372 | -0.011841 | 0.047475 | -0.002068 | 0.077111 | 0.129061 | -0.006539 | 0.060566 | 0.097708 | 0.006246 | 0.005628 | -0.003582 | 0.004664 | -0.017392 | -0.021191 | -0.002748 | 0.008300 | 0.003116 | 0.325302 | 0.267566 | -0.027269 |
| repaglinide | 0.025466 | -0.004777 | 0.045565 | -0.005440 | -0.003481 | -0.002759 | -0.003732 | 0.034985 | 0.032986 | 0.010220 | 0.010438 | 0.005662 | 0.019283 | 0.001026 | 0.007820 | 0.011936 | 0.002242 | 0.003082 | 0.005636 | 0.033225 | -0.015106 | 0.022541 | -0.001074 | 1.000000 | -0.003246 | -0.003466 | -0.007518 | -0.000511 | -0.015927 | -0.024160 | -0.001617 | 0.019393 | 0.009031 | 0.011257 | 0.018066 | -0.000886 | -0.002287 | 0.006058 | -0.004506 | -0.001534 | -0.000511 | -0.000511 | 0.071294 | 0.066174 | 0.006183 |
| nateglinide | -0.004170 | -0.005390 | 0.020363 | 0.010707 | -0.008099 | -0.008790 | -0.019612 | 0.003320 | 0.014676 | 0.006590 | -0.008292 | -0.002359 | 0.023352 | 0.002719 | 0.005489 | -0.006284 | -0.000440 | -0.000322 | 0.003922 | 0.012336 | -0.016794 | -0.000669 | 0.020372 | -0.003246 | 1.000000 | -0.002386 | 0.004488 | -0.000352 | -0.018191 | -0.020817 | -0.001113 | 0.025830 | 0.013947 | -0.004585 | 0.018302 | -0.000610 | -0.001575 | 0.001396 | -0.006775 | -0.001056 | -0.000352 | -0.000352 | 0.052927 | 0.045552 | 0.001154 |
| chlorpropamide | 0.006801 | 0.006481 | 0.012367 | -0.000839 | 0.007875 | 0.018525 | 0.002666 | 0.004094 | -0.022046 | 0.002161 | -0.005659 | 0.004757 | -0.000940 | -0.004402 | -0.004218 | -0.008317 | -0.002017 | -0.004128 | -0.007445 | -0.014080 | 0.008938 | -0.003225 | -0.011841 | -0.003466 | -0.002386 | 1.000000 | -0.006537 | -0.000121 | -0.010745 | -0.005823 | -0.000383 | -0.007959 | -0.000123 | -0.001576 | -0.000581 | -0.000210 | -0.000541 | -0.020008 | -0.002329 | -0.000363 | -0.000121 | -0.000121 | -0.007035 | 0.015661 | -0.008312 |
| glimepiride | 0.008261 | -0.000156 | 0.044360 | 0.013694 | -0.003178 | -0.022360 | -0.026685 | 0.016086 | 0.038055 | 0.012798 | 0.005344 | 0.007223 | 0.045223 | -0.009039 | 0.003318 | -0.016545 | 0.000410 | 0.006773 | -0.010677 | 0.013640 | -0.031840 | 0.022787 | 0.047475 | -0.007518 | 0.004488 | -0.006537 | 1.000000 | -0.000964 | -0.071983 | -0.067334 | -0.003050 | 0.042601 | 0.038655 | 0.018418 | 0.019830 | 0.009191 | -0.004314 | 0.012479 | -0.012202 | -0.002894 | -0.000964 | -0.000964 | 0.138970 | 0.124797 | -0.001818 |
| acetohexamide | 0.001793 | -0.003935 | 0.002400 | -0.000736 | -0.002988 | 0.014597 | 0.001296 | 0.013596 | -0.004231 | -0.002891 | 0.005344 | 0.006615 | 0.009348 | -0.001242 | -0.000907 | -0.002118 | 0.001038 | 0.002264 | 0.000333 | 0.003449 | -0.000902 | -0.001747 | -0.002068 | -0.000511 | -0.000352 | -0.000121 | -0.000964 | 1.000000 | -0.001585 | -0.001414 | -0.000056 | -0.001174 | -0.001085 | -0.000233 | -0.000086 | -0.000031 | -0.000080 | 0.003607 | -0.000344 | -0.000054 | -0.000018 | -0.000018 | 0.004554 | 0.002311 | -0.001503 |
| glipizide | 0.018551 | 0.026810 | 0.055867 | 0.017062 | 0.007991 | -0.013379 | 0.009300 | 0.016737 | 0.005875 | 0.007273 | 0.012450 | 0.004999 | 0.056985 | 0.010527 | -0.003426 | -0.022736 | 0.010541 | 0.004223 | -0.005554 | -0.005975 | 0.005931 | 0.020844 | 0.077111 | -0.015927 | -0.018191 | -0.010745 | -0.071983 | -0.001585 | 1.000000 | -0.104495 | -0.005014 | 0.049752 | 0.041498 | 0.030598 | 0.002971 | -0.002746 | -0.001524 | -0.027179 | -0.027923 | -0.000607 | -0.001585 | -0.001585 | 0.194260 | 0.205145 | 0.003333 |
| glyburide | 0.015784 | 0.034631 | 0.076798 | 0.008707 | -0.002804 | 0.048256 | 0.004919 | 0.023482 | -0.047599 | -0.005929 | -0.001768 | 0.001531 | 0.030886 | -0.000482 | -0.027870 | -0.036659 | 0.017872 | 0.010435 | -0.005157 | -0.024247 | 0.000373 | 0.009977 | 0.129061 | -0.024160 | -0.020817 | -0.005823 | -0.067334 | -0.001414 | -0.104495 | 1.000000 | -0.004473 | 0.027727 | 0.030766 | 0.015094 | -0.000056 | -0.002450 | -0.006327 | -0.071853 | -0.006909 | 0.000245 | -0.001414 | -0.001414 | 0.172392 | 0.183024 | -0.004985 |
| tolbutamide | -0.001455 | -0.001727 | 0.010110 | -0.002326 | 0.006347 | 0.003228 | 0.001791 | 0.001799 | -0.002662 | -0.001944 | -0.001320 | -0.003423 | 0.002943 | 0.000350 | -0.002870 | -0.003545 | 0.006039 | 0.000339 | -0.002879 | 0.001220 | 0.006437 | -0.005526 | -0.006539 | -0.001617 | -0.001113 | -0.000383 | -0.003050 | -0.000056 | -0.005014 | -0.004473 | 1.000000 | -0.003714 | -0.003432 | -0.000736 | -0.000271 | -0.000098 | -0.000253 | -0.001925 | -0.001087 | -0.000169 | -0.000056 | -0.000056 | 0.001000 | 0.007308 | -0.004752 |
| pioglitazone | 0.026105 | 0.002339 | 0.013860 | 0.026059 | 0.018570 | -0.014116 | -0.005729 | 0.008521 | 0.034867 | 0.002210 | -0.015599 | 0.016471 | 0.071584 | 0.012212 | -0.001978 | -0.026804 | 0.024890 | 0.000030 | -0.008180 | 0.002278 | -0.014531 | 0.000223 | 0.060566 | 0.019393 | 0.025830 | -0.007959 | 0.042601 | -0.001174 | 0.049752 | 0.027727 | -0.003714 | 1.000000 | -0.062763 | 0.015377 | 0.000791 | -0.002034 | -0.001659 | 0.003954 | 0.022117 | 0.007190 | -0.001174 | 0.014894 | 0.203180 | 0.151949 | -0.005122 |
| rosiglitazone | 0.005938 | 0.010843 | 0.003034 | 0.004232 | 0.022930 | -0.001694 | -0.008894 | 0.008531 | -0.008782 | 0.016639 | -0.010260 | 0.018742 | 0.052860 | -0.001550 | -0.006844 | -0.021471 | 0.010041 | -0.010618 | -0.003303 | -0.011524 | -0.009275 | 0.009548 | 0.097708 | 0.009031 | 0.013947 | -0.000123 | 0.038655 | -0.001085 | 0.041498 | 0.030766 | -0.003432 | -0.062763 | 1.000000 | 0.002006 | 0.003416 | 0.008079 | -0.000996 | 0.004080 | 0.003340 | -0.003256 | -0.001085 | -0.001085 | 0.191641 | 0.140410 | -0.009096 |
| acarbose | 0.013237 | 0.010581 | 0.008092 | 0.010411 | 0.006061 | 0.006779 | -0.000753 | 0.007231 | -0.002629 | -0.005808 | -0.000654 | -0.000362 | 0.017947 | 0.009388 | 0.004224 | 0.000411 | 0.003061 | 0.000704 | 0.000458 | 0.007741 | 0.005479 | 0.009374 | 0.006246 | 0.011257 | -0.004585 | -0.001576 | 0.018418 | -0.000233 | 0.030598 | 0.015094 | -0.000736 | 0.015377 | 0.002006 | 1.000000 | -0.001117 | -0.000403 | -0.001040 | -0.001790 | 0.013046 | -0.000698 | -0.000233 | -0.000233 | 0.047261 | 0.030097 | -0.002534 |
| miglitol | -0.001307 | 0.009920 | 0.011788 | -0.003532 | -0.001414 | 0.005779 | -0.000763 | 0.005083 | 0.011455 | 0.002816 | -0.002963 | -0.001521 | 0.006422 | -0.002243 | -0.000207 | -0.003851 | 0.006030 | 0.005705 | -0.000319 | -0.000293 | -0.004328 | 0.007741 | 0.005628 | 0.018066 | 0.018302 | -0.000581 | 0.019830 | -0.000086 | 0.002971 | -0.000056 | -0.000271 | 0.000791 | 0.003416 | -0.001117 | 1.000000 | -0.000148 | -0.000383 | 0.000451 | -0.001650 | -0.000257 | -0.000086 | -0.000086 | 0.018472 | 0.011094 | -0.001857 |
| troglitazone | 0.003106 | 0.007860 | -0.001978 | -0.001274 | 0.003307 | 0.008684 | 0.002245 | 0.004746 | -0.007329 | -0.002660 | 0.005036 | -0.005745 | 0.002992 | -0.002152 | -0.001572 | -0.003669 | 0.000456 | 0.001300 | -0.000620 | 0.004710 | -0.001562 | -0.003026 | -0.003582 | -0.000886 | -0.000610 | -0.000210 | 0.009191 | -0.000031 | -0.002746 | -0.002450 | -0.000098 | -0.002034 | 0.008079 | -0.000403 | -0.000148 | 1.000000 | -0.000138 | -0.000391 | -0.000595 | -0.000093 | -0.000031 | -0.000031 | 0.007888 | 0.004002 | -0.002602 |
| tolazamide | 0.003990 | 0.003242 | 0.003605 | 0.000692 | 0.010291 | 0.013139 | 0.001834 | 0.000328 | -0.015677 | -0.004689 | 0.000008 | 0.005154 | -0.002113 | -0.005556 | -0.004059 | -0.003526 | 0.000745 | -0.003710 | -0.003145 | -0.013444 | -0.004032 | -0.000390 | 0.004664 | -0.002287 | -0.001575 | -0.000541 | -0.004314 | -0.000080 | -0.001524 | -0.006327 | -0.000253 | -0.001659 | -0.000996 | -0.001040 | -0.000383 | -0.000138 | 1.000000 | -0.013867 | -0.001537 | -0.000240 | -0.000080 | -0.000080 | -0.002376 | 0.010336 | -0.006721 |
| insulin | -0.039862 | 0.000247 | -0.079078 | -0.076697 | -0.025368 | -0.041842 | 0.005094 | 0.101223 | 0.115265 | -0.014342 | 0.085401 | 0.015020 | 0.198963 | 0.010029 | 0.048501 | 0.060505 | -0.075260 | -0.007776 | 0.013942 | 0.076730 | 0.000884 | 0.107227 | -0.017392 | 0.006058 | 0.001396 | -0.020008 | 0.012479 | 0.003607 | -0.027179 | -0.071853 | -0.001925 | 0.003954 | 0.004080 | -0.001790 | 0.000451 | -0.000391 | -0.013867 | 1.000000 | 0.005828 | -0.000677 | 0.003607 | 0.003607 | 0.461502 | 0.525169 | 0.024743 |
| glyburide.metformin | 0.006384 | 0.002489 | -0.002451 | -0.014159 | -0.000573 | -0.002994 | -0.024616 | -0.006358 | 0.055730 | 0.000051 | -0.010852 | -0.000553 | 0.013382 | -0.008428 | 0.001956 | -0.008426 | 0.015281 | -0.007621 | -0.000101 | -0.005894 | -0.014296 | -0.005008 | -0.021191 | -0.004506 | -0.006775 | -0.002329 | -0.012202 | -0.000344 | -0.027923 | -0.006909 | -0.001087 | 0.022117 | 0.003340 | 0.013046 | -0.001650 | -0.000595 | -0.001537 | 0.005828 | 1.000000 | 0.050992 | -0.000344 | -0.000344 | 0.038712 | 0.044474 | -0.005937 |
| glipizide.metformin | 0.005380 | 0.007965 | 0.003658 | -0.002207 | -0.005046 | 0.000933 | -0.000281 | -0.001692 | 0.010871 | -0.006234 | -0.006685 | -0.006640 | 0.002757 | 0.003037 | -0.002723 | -0.000813 | 0.004664 | 0.005659 | 0.006007 | -0.006428 | -0.002705 | 0.001082 | -0.002748 | -0.001534 | -0.001056 | -0.000363 | -0.002894 | -0.000054 | -0.000607 | 0.000245 | -0.000169 | 0.007190 | -0.003256 | -0.000698 | -0.000257 | -0.000093 | -0.000240 | -0.000677 | 0.050992 | 1.000000 | -0.000054 | -0.000054 | 0.010838 | 0.006933 | -0.000045 |
| metformin.rosiglitazone | -0.011726 | 0.004538 | 0.002400 | -0.000736 | -0.002988 | -0.002174 | 0.001296 | -0.003396 | 0.009326 | -0.002891 | 0.001689 | -0.003317 | -0.002603 | -0.001242 | -0.000907 | 0.001207 | -0.003029 | -0.000006 | 0.006032 | 0.003449 | -0.000902 | -0.001747 | 0.008300 | -0.000511 | -0.000352 | -0.000121 | -0.000964 | -0.000018 | -0.001585 | -0.001414 | -0.000056 | -0.001174 | -0.001085 | -0.000233 | -0.000086 | -0.000031 | -0.000080 | 0.003607 | -0.000344 | -0.000054 | 1.000000 | -0.000018 | 0.004554 | 0.002311 | -0.001503 |
| metformin.pioglitazone | 0.001793 | -0.003935 | -0.000257 | -0.000736 | 0.002888 | -0.000576 | -0.004958 | 0.002268 | -0.000358 | 0.010115 | -0.004330 | -0.000834 | 0.002074 | -0.001242 | -0.000907 | -0.002118 | 0.003943 | -0.005116 | -0.004330 | -0.007491 | -0.000902 | -0.001747 | 0.003116 | -0.000511 | -0.000352 | -0.000121 | -0.000964 | -0.000018 | -0.001585 | -0.001414 | -0.000056 | 0.014894 | -0.001085 | -0.000233 | -0.000086 | -0.000031 | -0.000080 | 0.003607 | -0.000344 | -0.000054 | -0.000018 | 1.000000 | 0.004554 | 0.002311 | -0.001503 |
| change | 0.008300 | 0.012476 | -0.037793 | -0.041219 | 0.003992 | -0.014047 | 0.002583 | 0.112359 | 0.121010 | -0.005111 | 0.062801 | 0.005976 | 0.248529 | 0.027105 | 0.041797 | 0.025420 | -0.033688 | -0.006439 | 0.005824 | 0.055250 | 0.008958 | 0.105614 | 0.325302 | 0.071294 | 0.052927 | -0.007035 | 0.138970 | 0.004554 | 0.194260 | 0.172392 | 0.001000 | 0.203180 | 0.191641 | 0.047261 | 0.018472 | 0.007888 | -0.002376 | 0.461502 | 0.038712 | 0.010838 | 0.004554 | 0.004554 | 1.000000 | 0.507411 | 0.018728 |
| diabetesMed | -0.004537 | 0.015391 | -0.025360 | -0.030585 | -0.003930 | -0.029452 | 0.000535 | 0.059464 | 0.077597 | -0.002299 | 0.030903 | -0.009904 | 0.186247 | 0.017340 | 0.029415 | 0.025559 | -0.028985 | -0.010210 | -0.007452 | 0.019375 | -0.005206 | 0.086291 | 0.267566 | 0.066174 | 0.045552 | 0.015661 | 0.124797 | 0.002311 | 0.205145 | 0.183024 | 0.007308 | 0.151949 | 0.140410 | 0.030097 | 0.011094 | 0.004002 | 0.010336 | 0.525169 | 0.044474 | 0.006933 | 0.002311 | 0.002311 | 0.507411 | 1.000000 | 0.024819 |
| readm2 | 0.008626 | -0.003421 | 0.015077 | -0.001510 | -0.007755 | 0.051471 | 0.005589 | 0.041582 | -0.007707 | -0.014395 | 0.018358 | -0.013332 | 0.032318 | 0.017448 | 0.053723 | 0.162676 | -0.006205 | 0.004473 | 0.012287 | 0.045801 | 0.011038 | -0.014302 | -0.027269 | 0.006183 | 0.001154 | -0.008312 | -0.001818 | -0.001503 | 0.003333 | -0.004985 | -0.004752 | -0.005122 | -0.009096 | -0.002534 | -0.001857 | -0.002602 | -0.006721 | 0.024743 | -0.005937 | -0.000045 | -0.001503 | -0.001503 | 0.018728 | 0.024819 | 1.000000 |
Preliminary possibilites correlated with readm2 has changed versus readmitted¶
- number_emergency = 0.103321 ==> No longer is showing significant correlation now at 0.053
- number_inpatient = 0.233149 ==> Is now the only one showing any significant correlation at 0.162
- number_diagnoses = 0.103885 ==> No longer is showing significant correlation now at 0.045
# heatmap for correlation
plt.figure(figsize=(35,36))
sns.heatmap(df.corr(), annot=True)
<matplotlib.axes._subplots.AxesSubplot at 0x484f2c88>
# input model building slide here
Image('Images/christensen_finalprj/Slide10.png')
Classification Model building.¶
- Need to narrow down the number of factors in order to focus on the most significant ones
ExtraTreeClassifier¶
# Set Y and X
y = df['readm2']
X = df.drop(['readm2'], axis=1)
X.head(12)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | insulin | glyburide.metformin | glipizide.metformin | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 1 | 1 |
| 2 | 5 | 0 | 6 | 0 | 1 | 22 | 7 | 4 | 14 | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 24 | 18 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 3 | 1 | 7 | 0 | 1 | 1 | 7 | 3 | 0 | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 17 | 4 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | 3 | 0 | 3 | 0 | 2 | 1 | 1 | 3 | 0 | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 23 | 18 | 32 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 3 | 1 | 7 | 0 | 2 | 1 | 1 | 2 | 0 | 18 | 4 | 0 | 7 | 0 | 0 | 0 | 14 | 9 | 3 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 6 | 14 | 33 | 89 | 0 | 25 | 0 | 2 | 1 | 25 | 10 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 1 |
| 7 | 3 | 0 | 7 | 0 | 1 | 6 | 7 | 4 | 6 | 0 | 63 | 0 | 22 | 0 | 2 | 4 | 16 | 3 | 3 | 5 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 8 | 3 | 1 | 6 | 0 | 1 | 1 | 7 | 6 | 7 | 0 | 45 | 0 | 24 | 0 | 0 | 0 | 16 | 9 | 3 | 7 | 0 | 3 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
| 9 | 3 | 1 | 8 | 0 | 1 | 1 | 7 | 2 | 3 | 0 | 45 | 0 | 13 | 0 | 0 | 0 | 17 | 3 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
| 10 | 3 | 1 | 6 | 0 | 2 | 1 | 1 | 3 | 7 | 0 | 57 | 6 | 21 | 0 | 0 | 0 | 12 | 10 | 4 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 |
| 11 | 3 | 0 | 4 | 0 | 6 | 1 | 17 | 6 | 0 | 18 | 81 | 0 | 26 | 0 | 0 | 0 | 13 | 21 | 21 | 9 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
# build logisticRegression
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
lr = LogisticRegression()
lr.fit(X_train, y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
# build ExtraTreesClassifier
model_extra = ExtraTreesClassifier()
model_extra.fit(X, y)
model_extra.score(X, y)
# display the relative importance of each attribute
print(model_extra.feature_importances_)
[ 2.91794982e-02 2.53679647e-02 5.37078564e-02 7.50331122e-03 3.35746117e-02 4.57729444e-02 2.97595433e-02 5.66506897e-02 4.27513308e-02 4.18174543e-02 6.59548839e-02 4.37842852e-02 6.46074445e-02 2.93875786e-02 2.45889522e-02 5.46483089e-02 5.79017039e-02 5.91816193e-02 5.91892575e-02 4.36703730e-02 7.38929730e-03 1.52840833e-02 1.17254508e-02 4.00583871e-03 2.24838652e-03 8.73138518e-05 8.64359359e-03 1.86232908e-06 1.34819531e-02 1.19863156e-02 2.40050594e-05 9.38387138e-03 7.51014406e-03 1.08885851e-03 1.67479741e-04 4.64248888e-06 3.14279024e-05 2.30591406e-02 1.41216087e-03 8.83976179e-05 0.00000000e+00 3.74683575e-07 9.02443469e-03 4.35135523e-03]
# What are the highest ranking X variables according to ExtraTreeClassifier?
print("Features sorted by their rank:")
print(sorted(zip(map(lambda x: round(x, 4), model_extra.feature_importances_), X.columns)))
Features sorted by their rank: [(0.0, 'acetohexamide'), (0.0, 'metformin.pioglitazone'), (0.0, 'metformin.rosiglitazone'), (0.0, 'tolazamide'), (0.0, 'tolbutamide'), (0.0, 'troglitazone'), (0.0001, 'chlorpropamide'), (0.0001, 'glipizide.metformin'), (0.00020000000000000001, 'miglitol'), (0.0011000000000000001, 'acarbose'), (0.0014, 'glyburide.metformin'), (0.0022000000000000001, 'nateglinide'), (0.0040000000000000001, 'repaglinide'), (0.0044000000000000003, 'diabetesMed'), (0.0074000000000000003, 'max_glu_serum'), (0.0074999999999999997, 'rosiglitazone'), (0.0074999999999999997, 'weight'), (0.0086, 'glimepiride'), (0.0089999999999999993, 'change'), (0.0094000000000000004, 'pioglitazone'), (0.0117, 'metformin'), (0.012, 'glyburide'), (0.0135, 'glipizide'), (0.015299999999999999, 'A1Cresult'), (0.023099999999999999, 'insulin'), (0.0246, 'number_emergency'), (0.025399999999999999, 'gender'), (0.0292, 'race'), (0.029399999999999999, 'number_outpatient'), (0.0298, 'admission_source_id'), (0.033599999999999998, 'admission_type_id'), (0.041799999999999997, 'MED_SPEC_NUM'), (0.042799999999999998, 'payer_code'), (0.043700000000000003, 'number_diagnoses'), (0.043799999999999999, 'num_procedures'), (0.0458, 'discharge_disposition_id'), (0.053699999999999998, 'age'), (0.054600000000000003, 'number_inpatient'), (0.0567, 'time_in_hospital'), (0.0579, 'DIAG_CAT_1'), (0.059200000000000003, 'DIAG_CAT_2'), (0.059200000000000003, 'DIAG_CAT_3'), (0.064600000000000005, 'num_medications'), (0.066000000000000003, 'num_lab_procedures')]
All facotrs with less than 0.005 (rounded) will be removed¶
- 'repaglinide' at 0.0041 and less are dropped
- 'diabetesMed' at 0.0049 and above are kept
#drop or remove the column 'ID' since this column is not used in the analysis and disply the result
df = df.drop('acetohexamide', axis=1)
df = df.drop('metformin.pioglitazone', axis=1)
df = df.drop('metformin.rosiglitazone', axis=1)
df = df.drop('tolazamide', axis=1)
df = df.drop('tolbutamide', axis=1)
df = df.drop('troglitazone', axis=1)
df = df.drop('chlorpropamide', axis=1)
df = df.drop('glipizide.metformin', axis=1)
df = df.drop('miglitol', axis=1)
df = df.drop('acarbose', axis=1)
df = df.drop('glyburide.metformin', axis=1)
df = df.drop('nateglinide', axis=1)
df = df.drop('repaglinide', axis=1)
df.head(12)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | glimepiride | glipizide | glyburide | pioglitazone | rosiglitazone | insulin | change | diabetesMed | readm2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 3 | 1 | 1 | 0 |
| 2 | 5 | 0 | 6 | 0 | 1 | 22 | 7 | 4 | 14 | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 24 | 18 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 |
| 3 | 3 | 1 | 7 | 0 | 1 | 1 | 7 | 3 | 0 | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 17 | 4 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 |
| 4 | 3 | 0 | 3 | 0 | 2 | 1 | 1 | 3 | 0 | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 23 | 18 | 32 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 3 | 1 | 7 | 0 | 2 | 1 | 1 | 2 | 0 | 18 | 4 | 0 | 7 | 0 | 0 | 0 | 14 | 9 | 3 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 6 | 14 | 33 | 89 | 0 | 25 | 0 | 2 | 1 | 25 | 10 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 0 |
| 7 | 3 | 0 | 7 | 0 | 1 | 6 | 7 | 4 | 6 | 0 | 63 | 0 | 22 | 0 | 2 | 4 | 16 | 3 | 3 | 5 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 8 | 3 | 1 | 6 | 0 | 1 | 1 | 7 | 6 | 7 | 0 | 45 | 0 | 24 | 0 | 0 | 0 | 16 | 9 | 3 | 7 | 0 | 3 | 2 | 0 | 0 | 0 | 2 | 0 | 1 | 1 | 1 | 0 |
| 9 | 3 | 1 | 8 | 0 | 1 | 1 | 7 | 2 | 3 | 0 | 45 | 0 | 13 | 0 | 0 | 0 | 17 | 3 | 3 | 9 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 1 | 1 | 0 |
| 10 | 3 | 1 | 6 | 0 | 2 | 1 | 1 | 3 | 7 | 0 | 57 | 6 | 21 | 0 | 0 | 0 | 12 | 10 | 4 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 |
| 11 | 3 | 0 | 4 | 0 | 6 | 1 | 17 | 6 | 0 | 18 | 81 | 0 | 26 | 0 | 0 | 0 | 13 | 21 | 21 | 9 | 0 | 3 | 0 | 0 | 2 | 0 | 2 | 0 | 0 | 1 | 1 | 1 |
#general info
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 56000 entries, 0 to 55999 Data columns (total 32 columns): race 56000 non-null int64 gender 56000 non-null int64 age 56000 non-null int64 weight 56000 non-null int64 admission_type_id 56000 non-null int64 discharge_disposition_id 56000 non-null int64 admission_source_id 56000 non-null int64 time_in_hospital 56000 non-null int64 payer_code 56000 non-null int64 MED_SPEC_NUM 56000 non-null int64 num_lab_procedures 56000 non-null int64 num_procedures 56000 non-null int64 num_medications 56000 non-null int64 number_outpatient 56000 non-null int64 number_emergency 56000 non-null int64 number_inpatient 56000 non-null int64 DIAG_CAT_1 56000 non-null int64 DIAG_CAT_2 56000 non-null int64 DIAG_CAT_3 56000 non-null int64 number_diagnoses 56000 non-null int64 max_glu_serum 56000 non-null int64 A1Cresult 56000 non-null int64 metformin 56000 non-null int64 glimepiride 56000 non-null int64 glipizide 56000 non-null int64 glyburide 56000 non-null int64 pioglitazone 56000 non-null int64 rosiglitazone 56000 non-null int64 insulin 56000 non-null int64 change 56000 non-null int64 diabetesMed 56000 non-null int64 readm2 56000 non-null int64 dtypes: int64(32) memory usage: 13.7 MB
#basic statistics
df.describe()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | glimepiride | glipizide | glyburide | pioglitazone | rosiglitazone | insulin | change | diabetesMed | readm2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 | 56000.000000 |
| mean | 2.602071 | 0.464464 | 6.096589 | 0.123946 | 2.016893 | 3.721821 | 5.756643 | 4.398161 | 4.369375 | 10.668643 | 43.141661 | 1.335893 | 16.009268 | 0.367321 | 0.196875 | 0.637054 | 14.213321 | 12.011054 | 11.357411 | 7.423750 | 0.092089 | 0.368375 | 0.398875 | 0.102536 | 0.254732 | 0.210071 | 0.146161 | 0.125821 | 1.058839 | 0.462679 | 0.769821 | 0.112232 |
| std | 0.937754 | 0.498740 | 1.590761 | 0.712004 | 1.438340 | 5.291517 | 4.053838 | 2.984346 | 4.363828 | 15.595799 | 19.656507 | 1.702009 | 8.132455 | 1.249570 | 0.916820 | 1.270768 | 7.272908 | 7.443902 | 8.157131 | 1.931488 | 0.431655 | 0.890972 | 0.815169 | 0.449274 | 0.678992 | 0.627625 | 0.525985 | 0.490002 | 1.102484 | 0.498610 | 0.420951 | 0.315655 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 3.000000 | 0.000000 | 5.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 2.000000 | 0.000000 | 0.000000 | 32.000000 | 0.000000 | 10.000000 | 0.000000 | 0.000000 | 0.000000 | 10.000000 | 4.000000 | 3.000000 | 6.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 50% | 3.000000 | 0.000000 | 6.000000 | 0.000000 | 1.000000 | 1.000000 | 7.000000 | 4.000000 | 6.000000 | 4.000000 | 44.000000 | 1.000000 | 15.000000 | 0.000000 | 0.000000 | 0.000000 | 15.000000 | 12.000000 | 10.000000 | 8.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 0.000000 |
| 75% | 3.000000 | 1.000000 | 7.000000 | 0.000000 | 3.000000 | 4.000000 | 7.000000 | 6.000000 | 7.000000 | 18.000000 | 57.000000 | 2.000000 | 20.000000 | 0.000000 | 0.000000 | 1.000000 | 18.000000 | 17.000000 | 17.000000 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.000000 | 1.000000 | 1.000000 | 0.000000 |
| max | 5.000000 | 1.000000 | 9.000000 | 9.000000 | 8.000000 | 28.000000 | 25.000000 | 14.000000 | 16.000000 | 63.000000 | 132.000000 | 6.000000 | 75.000000 | 42.000000 | 76.000000 | 21.000000 | 32.000000 | 32.000000 | 32.000000 | 16.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 1.000000 | 1.000000 | 1.000000 |
#correlation analysis
df.corr()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | glimepiride | glipizide | glyburide | pioglitazone | rosiglitazone | insulin | change | diabetesMed | readm2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| race | 1.000000 | 0.061706 | 0.114255 | 0.040520 | 0.096587 | 0.005805 | 0.033113 | -0.020364 | 0.041640 | -0.030777 | -0.023193 | 0.024391 | 0.022157 | 0.050845 | -0.012812 | -0.006053 | 0.042924 | 0.029594 | 0.016000 | 0.081672 | 0.054576 | -0.013318 | 0.010548 | 0.008261 | 0.018551 | 0.015784 | 0.026105 | 0.005938 | -0.039862 | 0.008300 | -0.004537 | 0.008626 |
| gender | 0.061706 | 1.000000 | -0.048579 | 0.014491 | 0.014578 | -0.019566 | -0.005222 | -0.031088 | 0.000833 | 0.016623 | -0.004968 | 0.061668 | -0.023819 | -0.005846 | -0.024202 | -0.013405 | -0.034311 | 0.008083 | 0.008343 | -0.007818 | -0.001347 | 0.016539 | 0.001549 | -0.000156 | 0.026810 | 0.034631 | 0.002339 | 0.010843 | 0.000247 | 0.012476 | 0.015391 | -0.003421 |
| age | 0.114255 | -0.048579 | 1.000000 | 0.005716 | -0.005747 | 0.113970 | 0.041070 | 0.107273 | 0.058032 | -0.068202 | 0.025665 | -0.028360 | 0.039010 | 0.029064 | -0.089149 | -0.047012 | 0.091837 | 0.077541 | 0.052021 | 0.243515 | 0.018618 | -0.147559 | -0.060696 | 0.044360 | 0.055867 | 0.076798 | 0.013860 | 0.003034 | -0.079078 | -0.037793 | -0.025360 | 0.015077 |
| weight | 0.040520 | 0.014491 | 0.005716 | 1.000000 | 0.037503 | -0.035383 | 0.003026 | 0.023652 | 0.047819 | 0.004630 | 0.090456 | 0.018693 | 0.011274 | 0.104440 | 0.003706 | -0.009154 | 0.023982 | 0.031824 | 0.014000 | 0.054391 | -0.037139 | -0.021109 | 0.007304 | 0.013694 | 0.017062 | 0.008707 | 0.026059 | 0.004232 | -0.076697 | -0.041219 | -0.030585 | -0.001510 |
| admission_type_id | 0.096587 | 0.014578 | -0.005747 | 0.037503 | 1.000000 | 0.085986 | 0.098007 | -0.014285 | -0.136863 | 0.185351 | -0.145869 | 0.131923 | 0.075711 | 0.030746 | -0.018190 | -0.032648 | 0.032151 | -0.005648 | -0.008918 | -0.113991 | 0.352793 | -0.043929 | 0.008631 | -0.003178 | 0.007991 | -0.002804 | 0.018570 | 0.022930 | -0.025368 | 0.003992 | -0.003930 | -0.007755 |
| discharge_disposition_id | 0.005805 | -0.019566 | 0.113970 | -0.035383 | 0.085986 | 1.000000 | 0.016614 | 0.161954 | -0.123220 | -0.024028 | 0.022906 | 0.015536 | 0.105415 | -0.006101 | -0.024692 | 0.019240 | 0.034616 | 0.029774 | 0.024778 | 0.049496 | 0.037086 | -0.020713 | -0.008376 | -0.022360 | -0.013379 | 0.048256 | -0.014116 | -0.001694 | -0.041842 | -0.014047 | -0.029452 | 0.051471 |
| admission_source_id | 0.033113 | -0.005222 | 0.041070 | 0.003026 | 0.098007 | 0.016614 | 1.000000 | -0.006996 | -0.100157 | -0.152760 | 0.046823 | -0.137044 | -0.055016 | 0.028833 | 0.061938 | 0.033697 | -0.007753 | -0.019796 | 0.001447 | 0.076318 | 0.412356 | 0.006512 | -0.033283 | -0.026685 | 0.009300 | 0.004919 | -0.005729 | -0.008894 | 0.005094 | 0.002583 | 0.000535 | 0.005589 |
| time_in_hospital | -0.020364 | -0.031088 | 0.107273 | 0.023652 | -0.014285 | 0.161954 | -0.006996 | 1.000000 | -0.037805 | 0.023146 | 0.318234 | 0.193139 | 0.468752 | -0.003410 | -0.005467 | 0.079929 | -0.019913 | 0.086503 | 0.068677 | 0.224265 | 0.029079 | 0.058088 | -0.009071 | 0.016086 | 0.016737 | 0.023482 | 0.008521 | 0.008531 | 0.101223 | 0.112359 | 0.059464 | 0.041582 |
| payer_code | 0.041640 | 0.000833 | 0.058032 | 0.047819 | -0.136863 | -0.123220 | -0.100157 | -0.037805 | 1.000000 | -0.082746 | -0.049680 | -0.047581 | 0.005658 | 0.062572 | 0.067316 | 0.009598 | 0.008458 | 0.036335 | 0.033135 | 0.076424 | -0.095739 | -0.006824 | 0.027596 | 0.038055 | 0.005875 | -0.047599 | 0.034867 | -0.008782 | 0.115265 | 0.121010 | 0.077597 | -0.007707 |
| MED_SPEC_NUM | -0.030777 | 0.016623 | -0.068202 | 0.004630 | 0.185351 | -0.024028 | -0.152760 | 0.023146 | -0.082746 | 1.000000 | -0.068863 | 0.076952 | 0.036943 | -0.051445 | -0.009879 | -0.013909 | 0.018820 | -0.019354 | -0.015192 | -0.176693 | -0.003316 | -0.009813 | 0.023068 | 0.012798 | 0.007273 | -0.005929 | 0.002210 | 0.016639 | -0.014342 | -0.005111 | -0.002299 | -0.014395 |
| num_lab_procedures | -0.023193 | -0.004968 | 0.025665 | 0.090456 | -0.145869 | 0.022906 | 0.046823 | 0.318234 | -0.049680 | -0.068863 | 1.000000 | 0.055081 | 0.267707 | -0.008437 | 0.000613 | 0.037763 | -0.071046 | 0.011204 | 0.011021 | 0.149116 | -0.124907 | 0.236383 | -0.044042 | 0.005344 | 0.012450 | -0.001768 | -0.015599 | -0.010260 | 0.085401 | 0.062801 | 0.030903 | 0.018358 |
| num_procedures | 0.024391 | 0.061668 | -0.028360 | 0.018693 | 0.131923 | 0.015536 | -0.137044 | 0.193139 | -0.047581 | 0.076952 | 0.055081 | 1.000000 | 0.387685 | -0.028257 | -0.033659 | -0.061114 | -0.056866 | 0.036607 | 0.025920 | 0.074394 | -0.069910 | -0.017477 | -0.038122 | 0.007223 | 0.004999 | 0.001531 | 0.016471 | 0.018742 | 0.015020 | 0.005976 | -0.009904 | -0.013332 |
| num_medications | 0.022157 | -0.023819 | 0.039010 | 0.011274 | 0.075711 | 0.105415 | -0.055016 | 0.468752 | 0.005658 | 0.036943 | 0.267707 | 0.387685 | 1.000000 | 0.047313 | 0.017129 | 0.066793 | 0.004288 | 0.084268 | 0.063166 | 0.263311 | 0.001639 | 0.013044 | 0.069433 | 0.045223 | 0.056985 | 0.030886 | 0.071584 | 0.052860 | 0.198963 | 0.248529 | 0.186247 | 0.032318 |
| number_outpatient | 0.050845 | -0.005846 | 0.029064 | 0.104440 | 0.030746 | -0.006101 | 0.028833 | -0.003410 | 0.062572 | -0.051445 | -0.008437 | -0.028257 | 0.047313 | 1.000000 | 0.087824 | 0.103471 | -0.009347 | 0.028015 | 0.026595 | 0.093518 | 0.054949 | -0.024324 | -0.013006 | -0.009039 | 0.010527 | -0.000482 | 0.012212 | -0.001550 | 0.010029 | 0.027105 | 0.017340 | 0.017448 |
| number_emergency | -0.012812 | -0.024202 | -0.089149 | 0.003706 | -0.018190 | -0.024692 | 0.061938 | -0.005467 | 0.067316 | -0.009879 | 0.000613 | -0.033659 | 0.017129 | 0.087824 | 1.000000 | 0.279626 | -0.023803 | -0.004155 | 0.007427 | 0.059398 | 0.035679 | -0.004270 | -0.009572 | 0.003318 | -0.003426 | -0.027870 | -0.001978 | -0.006844 | 0.048501 | 0.041797 | 0.029415 | 0.053723 |
| number_inpatient | -0.006053 | -0.013405 | -0.047012 | -0.009154 | -0.032648 | 0.019240 | 0.033697 | 0.079929 | 0.009598 | -0.013909 | 0.037763 | -0.061114 | 0.066793 | 0.103471 | 0.279626 | 1.000000 | -0.004620 | 0.024244 | 0.032150 | 0.102473 | 0.038503 | -0.049379 | -0.073780 | -0.016545 | -0.022736 | -0.036659 | -0.026804 | -0.021471 | 0.060505 | 0.025420 | 0.025559 | 0.162676 |
| DIAG_CAT_1 | 0.042924 | -0.034311 | 0.091837 | 0.023982 | 0.032151 | 0.034616 | -0.007753 | -0.019913 | 0.008458 | 0.018820 | -0.071046 | -0.056866 | 0.004288 | -0.009347 | -0.023803 | -0.004620 | 1.000000 | 0.025858 | 0.028021 | 0.046451 | -0.016030 | -0.091392 | 0.033199 | 0.000410 | 0.010541 | 0.017872 | 0.024890 | 0.010041 | -0.075260 | -0.033688 | -0.028985 | -0.006205 |
| DIAG_CAT_2 | 0.029594 | 0.008083 | 0.077541 | 0.031824 | -0.005648 | 0.029774 | -0.019796 | 0.086503 | 0.036335 | -0.019354 | 0.011204 | 0.036607 | 0.084268 | 0.028015 | -0.004155 | 0.024244 | 0.025858 | 1.000000 | 0.081391 | 0.171521 | -0.017962 | -0.044930 | -0.018313 | 0.006773 | 0.004223 | 0.010435 | 0.000030 | -0.010618 | -0.007776 | -0.006439 | -0.010210 | 0.004473 |
| DIAG_CAT_3 | 0.016000 | 0.008343 | 0.052021 | 0.014000 | -0.008918 | 0.024778 | 0.001447 | 0.068677 | 0.033135 | -0.015192 | 0.011021 | 0.025920 | 0.063166 | 0.026595 | 0.007427 | 0.032150 | 0.028021 | 0.081391 | 1.000000 | 0.186667 | -0.009693 | -0.031716 | -0.024179 | -0.010677 | -0.005554 | -0.005157 | -0.008180 | -0.003303 | 0.013942 | 0.005824 | -0.007452 | 0.012287 |
| number_diagnoses | 0.081672 | -0.007818 | 0.243515 | 0.054391 | -0.113991 | 0.049496 | 0.076318 | 0.224265 | 0.076424 | -0.176693 | 0.149116 | 0.074394 | 0.263311 | 0.093518 | 0.059398 | 0.102473 | 0.046451 | 0.171521 | 0.186667 | 1.000000 | -0.036161 | -0.032983 | -0.073736 | 0.013640 | -0.005975 | -0.024247 | 0.002278 | -0.011524 | 0.076730 | 0.055250 | 0.019375 | 0.045801 |
| max_glu_serum | 0.054576 | -0.001347 | 0.018618 | -0.037139 | 0.352793 | 0.037086 | 0.412356 | 0.029079 | -0.095739 | -0.003316 | -0.124907 | -0.069910 | 0.001639 | 0.054949 | 0.035679 | 0.038503 | -0.016030 | -0.017962 | -0.009693 | -0.036161 | 1.000000 | -0.043540 | -0.029790 | -0.031840 | 0.005931 | 0.000373 | -0.014531 | -0.009275 | 0.000884 | 0.008958 | -0.005206 | 0.011038 |
| A1Cresult | -0.013318 | 0.016539 | -0.147559 | -0.021109 | -0.043929 | -0.020713 | 0.006512 | 0.058088 | -0.006824 | -0.009813 | 0.236383 | -0.017477 | 0.013044 | -0.024324 | -0.004270 | -0.049379 | -0.091392 | -0.044930 | -0.031716 | -0.032983 | -0.043540 | 1.000000 | 0.051894 | 0.022787 | 0.020844 | 0.009977 | 0.000223 | 0.009548 | 0.107227 | 0.105614 | 0.086291 | -0.014302 |
| metformin | 0.010548 | 0.001549 | -0.060696 | 0.007304 | 0.008631 | -0.008376 | -0.033283 | -0.009071 | 0.027596 | 0.023068 | -0.044042 | -0.038122 | 0.069433 | -0.013006 | -0.009572 | -0.073780 | 0.033199 | -0.018313 | -0.024179 | -0.073736 | -0.029790 | 0.051894 | 1.000000 | 0.047475 | 0.077111 | 0.129061 | 0.060566 | 0.097708 | -0.017392 | 0.325302 | 0.267566 | -0.027269 |
| glimepiride | 0.008261 | -0.000156 | 0.044360 | 0.013694 | -0.003178 | -0.022360 | -0.026685 | 0.016086 | 0.038055 | 0.012798 | 0.005344 | 0.007223 | 0.045223 | -0.009039 | 0.003318 | -0.016545 | 0.000410 | 0.006773 | -0.010677 | 0.013640 | -0.031840 | 0.022787 | 0.047475 | 1.000000 | -0.071983 | -0.067334 | 0.042601 | 0.038655 | 0.012479 | 0.138970 | 0.124797 | -0.001818 |
| glipizide | 0.018551 | 0.026810 | 0.055867 | 0.017062 | 0.007991 | -0.013379 | 0.009300 | 0.016737 | 0.005875 | 0.007273 | 0.012450 | 0.004999 | 0.056985 | 0.010527 | -0.003426 | -0.022736 | 0.010541 | 0.004223 | -0.005554 | -0.005975 | 0.005931 | 0.020844 | 0.077111 | -0.071983 | 1.000000 | -0.104495 | 0.049752 | 0.041498 | -0.027179 | 0.194260 | 0.205145 | 0.003333 |
| glyburide | 0.015784 | 0.034631 | 0.076798 | 0.008707 | -0.002804 | 0.048256 | 0.004919 | 0.023482 | -0.047599 | -0.005929 | -0.001768 | 0.001531 | 0.030886 | -0.000482 | -0.027870 | -0.036659 | 0.017872 | 0.010435 | -0.005157 | -0.024247 | 0.000373 | 0.009977 | 0.129061 | -0.067334 | -0.104495 | 1.000000 | 0.027727 | 0.030766 | -0.071853 | 0.172392 | 0.183024 | -0.004985 |
| pioglitazone | 0.026105 | 0.002339 | 0.013860 | 0.026059 | 0.018570 | -0.014116 | -0.005729 | 0.008521 | 0.034867 | 0.002210 | -0.015599 | 0.016471 | 0.071584 | 0.012212 | -0.001978 | -0.026804 | 0.024890 | 0.000030 | -0.008180 | 0.002278 | -0.014531 | 0.000223 | 0.060566 | 0.042601 | 0.049752 | 0.027727 | 1.000000 | -0.062763 | 0.003954 | 0.203180 | 0.151949 | -0.005122 |
| rosiglitazone | 0.005938 | 0.010843 | 0.003034 | 0.004232 | 0.022930 | -0.001694 | -0.008894 | 0.008531 | -0.008782 | 0.016639 | -0.010260 | 0.018742 | 0.052860 | -0.001550 | -0.006844 | -0.021471 | 0.010041 | -0.010618 | -0.003303 | -0.011524 | -0.009275 | 0.009548 | 0.097708 | 0.038655 | 0.041498 | 0.030766 | -0.062763 | 1.000000 | 0.004080 | 0.191641 | 0.140410 | -0.009096 |
| insulin | -0.039862 | 0.000247 | -0.079078 | -0.076697 | -0.025368 | -0.041842 | 0.005094 | 0.101223 | 0.115265 | -0.014342 | 0.085401 | 0.015020 | 0.198963 | 0.010029 | 0.048501 | 0.060505 | -0.075260 | -0.007776 | 0.013942 | 0.076730 | 0.000884 | 0.107227 | -0.017392 | 0.012479 | -0.027179 | -0.071853 | 0.003954 | 0.004080 | 1.000000 | 0.461502 | 0.525169 | 0.024743 |
| change | 0.008300 | 0.012476 | -0.037793 | -0.041219 | 0.003992 | -0.014047 | 0.002583 | 0.112359 | 0.121010 | -0.005111 | 0.062801 | 0.005976 | 0.248529 | 0.027105 | 0.041797 | 0.025420 | -0.033688 | -0.006439 | 0.005824 | 0.055250 | 0.008958 | 0.105614 | 0.325302 | 0.138970 | 0.194260 | 0.172392 | 0.203180 | 0.191641 | 0.461502 | 1.000000 | 0.507411 | 0.018728 |
| diabetesMed | -0.004537 | 0.015391 | -0.025360 | -0.030585 | -0.003930 | -0.029452 | 0.000535 | 0.059464 | 0.077597 | -0.002299 | 0.030903 | -0.009904 | 0.186247 | 0.017340 | 0.029415 | 0.025559 | -0.028985 | -0.010210 | -0.007452 | 0.019375 | -0.005206 | 0.086291 | 0.267566 | 0.124797 | 0.205145 | 0.183024 | 0.151949 | 0.140410 | 0.525169 | 0.507411 | 1.000000 | 0.024819 |
| readm2 | 0.008626 | -0.003421 | 0.015077 | -0.001510 | -0.007755 | 0.051471 | 0.005589 | 0.041582 | -0.007707 | -0.014395 | 0.018358 | -0.013332 | 0.032318 | 0.017448 | 0.053723 | 0.162676 | -0.006205 | 0.004473 | 0.012287 | 0.045801 | 0.011038 | -0.014302 | -0.027269 | -0.001818 | 0.003333 | -0.004985 | -0.005122 | -0.009096 | 0.024743 | 0.018728 | 0.024819 | 1.000000 |
# heatmap for correlation
plt.figure(figsize=(35,36))
sns.heatmap(df.corr(), annot=True)
<matplotlib.axes._subplots.AxesSubplot at 0x4c30fc50>
Decision Tree Model Building, Validation, Evaluation¶
- Remember the model should be "simple, but not too simple"
Going to Split Data into two Different Sets¶
- Training Set
- Test Set
# Set Y and X
y = df['readm2']
X = df.drop(['readm2'], axis=1)
X.head(12)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | glimepiride | glipizide | glyburide | pioglitazone | rosiglitazone | insulin | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 3 | 1 | 1 |
| 2 | 5 | 0 | 6 | 0 | 1 | 22 | 7 | 4 | 14 | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 24 | 18 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 |
| 3 | 3 | 1 | 7 | 0 | 1 | 1 | 7 | 3 | 0 | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 17 | 4 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 |
| 4 | 3 | 0 | 3 | 0 | 2 | 1 | 1 | 3 | 0 | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 23 | 18 | 32 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 3 | 1 | 7 | 0 | 2 | 1 | 1 | 2 | 0 | 18 | 4 | 0 | 7 | 0 | 0 | 0 | 14 | 9 | 3 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 6 | 14 | 33 | 89 | 0 | 25 | 0 | 2 | 1 | 25 | 10 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 |
| 7 | 3 | 0 | 7 | 0 | 1 | 6 | 7 | 4 | 6 | 0 | 63 | 0 | 22 | 0 | 2 | 4 | 16 | 3 | 3 | 5 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 8 | 3 | 1 | 6 | 0 | 1 | 1 | 7 | 6 | 7 | 0 | 45 | 0 | 24 | 0 | 0 | 0 | 16 | 9 | 3 | 7 | 0 | 3 | 2 | 0 | 0 | 0 | 2 | 0 | 1 | 1 | 1 |
| 9 | 3 | 1 | 8 | 0 | 1 | 1 | 7 | 2 | 3 | 0 | 45 | 0 | 13 | 0 | 0 | 0 | 17 | 3 | 3 | 9 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 1 | 1 |
| 10 | 3 | 1 | 6 | 0 | 2 | 1 | 1 | 3 | 7 | 0 | 57 | 6 | 21 | 0 | 0 | 0 | 12 | 10 | 4 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 |
| 11 | 3 | 0 | 4 | 0 | 6 | 1 | 17 | 6 | 0 | 18 | 81 | 0 | 26 | 0 | 0 | 0 | 13 | 21 | 21 | 9 | 0 | 3 | 0 | 0 | 2 | 0 | 2 | 0 | 0 | 1 | 1 |
# evaluate the model by splitting into train (70%) and test sets (30%)
# http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
# name the model as "dt"
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')
#Model evaluation
# http://scikit-learn.org/stable/modules/model_evaluation.html
print(metrics.accuracy_score(y_test, dt.predict(X_test)))
print(metrics.confusion_matrix(y_test, dt.predict(X_test)))
print(metrics.classification_report(y_test, dt.predict(X_test)))
print(metrics.roc_auc_score(y_test, dt.predict(X_test)))
# y-test is the acual y value in the testing dataset
# dt.predict(X_test) is the y value generated by your model
# If they are same, we can say your model is accurate.
0.796428571429
[[13017 1937]
[ 1483 363]]
precision recall f1-score support
0 0.90 0.87 0.88 14954
1 0.16 0.20 0.18 1846
avg / total 0.82 0.80 0.81 16800
0.533555413199
Question: Interpret the results of confusion matrix¶
- 13032 correctly classified as those who will not be readmitted.
- 1922 misclassified as those who will be readmitted, but actually will not be readmitted
- 299 correctly classified as those who will be readmitted
- 1547 misclassified as those who will not be readmitted, but actually will be readmitted
- Model accuracy would therefore be calculated as: - (13012+383) / (13012+1942+1463+383) = 13395/16800 = 0.7973 ==> Expect to be 79.73% accurate when this model is applied to real-world situation.
Visualizing decision tree¶
- There are two methods for this. You can use either method.
- Using Graphviz software. For this option, you need to have GraphViz installed in your mahcine.
# Graphviz
tree.export_graphviz(dt, out_file='data/decisiontree.dot', feature_names=X.columns)
# This is a "full-grown" tree
from IPython.display import Image
Image("data/decisiontree.png")
Interpreting decision tree¶
- Practical Sized Decision Tree
# Set Y and X
y = df['readm2']
X = df.drop(['readm2'], axis=1)
#max_depth = 5 ... otherwise you will get a full-grown tree, which is overfitting
# You can make a simpler decision tree
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
dt_simple = DecisionTreeClassifier(max_depth=3, min_samples_leaf=5)
dt_simple.fit(X_train, y_train)
# max_depth : The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
# min_samples_leaf : The minimum number of samples required to be at a leaf node
# http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3,
max_features=None, max_leaf_nodes=None, min_samples_leaf=5,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')
# Find out the performance of this model & interpret the results
print(metrics.accuracy_score(y_test, dt_simple.predict(X_test)))
print(metrics.confusion_matrix(y_test, dt_simple.predict(X_test)))
print(metrics.classification_report(y_test, dt_simple.predict(X_test)))
print(metrics.roc_auc_score(y_test, dt_simple.predict(X_test)))
0.889761904762
[[14946 8]
[ 1844 2]]
precision recall f1-score support
0 0.89 1.00 0.94 14954
1 0.20 0.00 0.00 1846
avg / total 0.81 0.89 0.84 16800
0.500274224849
# Visualize the simpler decision tree model (dt_simple)
tree.export_graphviz(dt_simple, out_file='data/decisiontree_simple.dot', feature_names=X.columns)
# Embed decision tree
from IPython.display import Image
Image("data/decisiontree_simple.png")
Model Deployment: Predict y values¶
- load Challenge_1_Validation_Work.csv (scoring dataset).
- This dataset has no y value, represeting the future.
- Apply your decision model and find out who is likely to be readmitted.
#import scoring data
#no Y value in this dataset ...
#we are trying to predict whether the people in this scoring dataset are likely to be readmitted <30 days or not
score = pd.read_csv('data/Challenge_1_Validation_Work.csv')
score.head(5)
| encounter_id | patient_nbr | race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | medical_specialty | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | diag_1 | DIAG_CAT_1 | diag_2 | DIAG_CAT_2 | diag_3 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 166684116 | 25357527 | Caucasian | Male | [40-50) | ? | 3 | 11 | 1 | 2 | MC | Nephrology | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 518 | 16 | 431 | 14 | 427 | 12 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
| 1 | 118577202 | 111860712 | Caucasian | Female | [40-50) | ? | 1 | 1 | 7 | 2 | BC | ? | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 996 | 27 | 250 | 3 | 530 | 17 | 5 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | Steady | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes |
| 2 | 44006898 | 20621907 | Caucasian | Male | [60-70) | ? | 6 | 7 | 7 | 1 | ? | InternalMedicine | 18 | 42 | 0 | 12 | 0 | 0 | 0 | 786 | 23 | V42 | 32 | 250 | 3 | 3 | >200 | None | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes |
| 3 | 55615428 | 110263914 | Caucasian | Female | [70-80) | ? | 1 | 1 | 7 | 5 | ? | ? | 0 | 52 | 2 | 25 | 1 | 1 | 0 | 820 | 24 | 427 | 12 | 428 | 13 | 9 | None | None | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Steady | No | No | No | No | No | Ch | Yes |
| 4 | 201098010 | 96526827 | Caucasian | Female | [40-50) | ? | 2 | 1 | 7 | 2 | SP | Surgery-General | 55 | 41 | 2 | 3 | 0 | 0 | 0 | 455 | 15 | 535 | 17 | 211 | 2 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
#drop or remove the columns 'encounter_id', 'patient_nbr' since this column is not used in the analysis and disply the result
score = score.drop('encounter_id', axis=1)
score = score.drop('patient_nbr', axis=1)
score = score.drop('medical_specialty', axis=1)
# drop or remove the columns 'diag_1', 'diag_2' and 'diag_3' since these values of been put into catergories
# in columns 'DIAG_CAT_1', 'DIAG_CAT_2' and 'DIAG_CAT_3'
score = score.drop('diag_1', axis=1)
score = score.drop('diag_2', axis=1)
score = score.drop('diag_3', axis=1)
score.head(5)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Caucasian | Male | [40-50) | ? | 3 | 11 | 1 | 2 | MC | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
| 1 | Caucasian | Female | [40-50) | ? | 1 | 1 | 7 | 2 | BC | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | Steady | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes |
| 2 | Caucasian | Male | [60-70) | ? | 6 | 7 | 7 | 1 | ? | 18 | 42 | 0 | 12 | 0 | 0 | 0 | 23 | 32 | 3 | 3 | >200 | None | No | No | No | No | No | No | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes |
| 3 | Caucasian | Female | [70-80) | ? | 1 | 1 | 7 | 5 | ? | 0 | 52 | 2 | 25 | 1 | 1 | 0 | 24 | 12 | 13 | 9 | None | None | No | Steady | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Steady | No | No | No | No | No | Ch | Yes |
| 4 | Caucasian | Female | [40-50) | ? | 2 | 1 | 7 | 2 | SP | 55 | 41 | 2 | 3 | 0 | 0 | 0 | 15 | 17 | 2 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
#replace the values of the 'race' column:
# ? = 0
# AfricanAmerican = 1
# Asian = 2
# Caucasion = 3
# Hispanic = 4
# Other = 5
score = score.replace({'race': {'?': 0, 'AfricanAmerican': 1, 'Asian': 2,'Caucasian': 3,'Hispanic': 4,'Other': 5}})
score.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | Male | [40-50) | ? | 3 | 11 | 1 | 2 | MC | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
| 1 | 3 | Female | [40-50) | ? | 1 | 1 | 7 | 2 | BC | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | Steady | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes |
#replace the values of the 'gender' column:
# Female = 0
# Male = 1
score = score.replace({'gender': {'Male': 1, 'Female': 0}})
score.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | [40-50) | ? | 3 | 11 | 1 | 2 | MC | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
| 1 | 3 | 0 | [40-50) | ? | 1 | 1 | 7 | 2 | BC | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | Steady | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes |
#replace the values of the 'age' column:
# [0-10) = 0
# [10-20) = 1
# [20-30) = 2
# [30-40) = 3
# [40-50) = 4
# [50-60) = 5
# [60-70) = 6
# [70-80) = 7
# [80-90) = 8
# [90-100) = 9
score = score.replace({'age': {'[0-10)': 0, '[10-20)': 1, '[20-30)': 2, '[30-40)': 3, '[40-50)': 4, '[50-60)': 5, '[60-70)': 6, '[70-80)': 7, '[80-90)': 8, '[90-100)': 9}})
score.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | 4 | ? | 3 | 11 | 1 | 2 | MC | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
| 1 | 3 | 0 | 4 | ? | 1 | 1 | 7 | 2 | BC | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | Steady | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes |
#replace the values of the 'weight' column:
# ? = 0
# [0-25) = 1
# [25-50) = 2
# [50-75) = 3
# [75-100) = 4
# [100-125) = 5
# [125-150) = 6
# [150-175) = 7
# [175-200) = 8
# > 200 = 9
score = score.replace({'weight': {'?': 0, '[0-25)': 1, '[25-50)': 2, '[50-75)': 3, '[75-100)': 4, '[100-125)': 5, '[125-150)': 6, '[150-175)': 7, '[175-200)': 8, '>200': 9}})
score.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | 4 | 0 | 3 | 11 | 1 | 2 | MC | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
| 1 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 2 | BC | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | Steady | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes |
#replace the values of the 'payer_code' column:
score = score.replace({'payer_code': {'?': 0, 'BC': 1, 'CH': 2, 'CM': 3, 'CP': 4, 'DM': 5, 'HM': 6, 'MC': 7, 'MD': 8, 'MP': 9, 'OG': 10, 'OT': 11, 'PO': 12, 'SI': 13, 'SP': 14, 'UN': 15, 'WC': 16, 'FR': 19}})
score.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | 4 | 0 | 3 | 11 | 1 | 2 | 7 | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | None | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
| 1 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 2 | 1 | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | None | None | Steady | No | No | No | No | No | No | Steady | No | No | Steady | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes |
#distribution of payer_code categories in the payer_code column
score.groupby('payer_code').count()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| payer_code | ||||||||||||||||||||||||||||||||||||||||||||||
| 0 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 | 5513 |
| 1 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 | 668 |
| 2 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 |
| 3 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 | 274 |
| 4 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 | 315 |
| 5 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 |
| 6 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 | 906 |
| 7 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 | 4437 |
| 8 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 | 484 |
| 9 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 |
| 10 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 | 142 |
| 11 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 |
| 12 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 |
| 13 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 |
| 14 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 | 669 |
| 15 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 |
| 16 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 |
| 19 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
score.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14000 entries, 0 to 13999 Data columns (total 47 columns): race 14000 non-null int64 gender 14000 non-null int64 age 14000 non-null int64 weight 14000 non-null int64 admission_type_id 14000 non-null int64 discharge_disposition_id 14000 non-null int64 admission_source_id 14000 non-null int64 time_in_hospital 14000 non-null int64 payer_code 14000 non-null int64 MED_SPEC_NUM 14000 non-null int64 num_lab_procedures 14000 non-null int64 num_procedures 14000 non-null int64 num_medications 14000 non-null int64 number_outpatient 14000 non-null int64 number_emergency 14000 non-null int64 number_inpatient 14000 non-null int64 DIAG_CAT_1 14000 non-null int64 DIAG_CAT_2 14000 non-null int64 DIAG_CAT_3 14000 non-null int64 number_diagnoses 14000 non-null int64 max_glu_serum 14000 non-null object A1Cresult 14000 non-null object metformin 14000 non-null object repaglinide 14000 non-null object nateglinide 14000 non-null object chlorpropamide 14000 non-null object glimepiride 14000 non-null object acetohexamide 14000 non-null object glipizide 14000 non-null object glyburide 14000 non-null object tolbutamide 14000 non-null object pioglitazone 14000 non-null object rosiglitazone 14000 non-null object acarbose 14000 non-null object miglitol 14000 non-null object troglitazone 14000 non-null object tolazamide 14000 non-null object examide 14000 non-null object citoglipton 14000 non-null object insulin 14000 non-null object glyburide.metformin 14000 non-null object glipizide.metformin 14000 non-null object glimepiride.pioglitazone 14000 non-null object metformin.rosiglitazone 14000 non-null object metformin.pioglitazone 14000 non-null object change 14000 non-null object diabetesMed 14000 non-null object dtypes: int64(20), object(27) memory usage: 5.0+ MB
#replace the values of the 'max_glu_serum' column:
# None = 0
# Norm = 1
# >200 = 2
# >300 = 3
score = score.replace({'max_glu_serum': {'None': 0, 'Norm': 1, '>200': 2, '>300': 3}})
score.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | 4 | 0 | 3 | 11 | 1 | 2 | 7 | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | 0 | None | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
| 1 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 2 | 1 | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | 0 | None | Steady | No | No | No | No | No | No | Steady | No | No | Steady | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes |
#replace the values of the 'A1Cresult' column:
# None = 0
# Norm = 1
# >7 = 2
# >8 = 3
score = score.replace({'A1Cresult': {'None': 0, 'Norm': 1, '>7': 2, '>8': 3}})
score.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | 4 | 0 | 3 | 11 | 1 | 2 | 7 | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | 0 | 0 | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
| 1 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 2 | 1 | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | 0 | 0 | Steady | No | No | No | No | No | No | Steady | No | No | Steady | No | No | No | No | No | No | Up | No | No | No | No | No | Ch | Yes |
#replace the values of the 'change' column:
# No = 0
# Ch = 1
score = score.replace({'change': {'No': 0, 'Ch': 1}})
score.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | 4 | 0 | 3 | 11 | 1 | 2 | 7 | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | 0 | 0 | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | 0 | No |
| 1 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 2 | 1 | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | 0 | 0 | Steady | No | No | No | No | No | No | Steady | No | No | Steady | No | No | No | No | No | No | Up | No | No | No | No | No | 1 | Yes |
#replace the values of the 'diabetesMed' column:
# No = 0
# Yes = 1
score = score.replace({'diabetesMed': {'No': 0, 'Yes': 1,}})
score.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | 4 | 0 | 3 | 11 | 1 | 2 | 7 | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | 0 | 0 | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | 0 | 0 |
| 1 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 2 | 1 | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | 0 | 0 | Steady | No | No | No | No | No | No | Steady | No | No | Steady | No | No | No | No | No | No | Up | No | No | No | No | No | 1 | 1 |
#replace the values in the medicene column:
# No = 0
# Down = 1
# Steady = 2
# Up = 3
score = score.replace({'metformin': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'repaglinide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'nateglinide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'chlorpropamide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'glimepiride': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'acetohexamide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'glipizide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'glyburide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'tolbutamide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'pioglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'rosiglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'acarbose': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'miglitol': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'troglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'tolazamide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'examide': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'citoglipton': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'insulin': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'glyburide.metformin': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'glipizide.metformin': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'glimepiride.pioglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'metformin.rosiglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score = score.replace({'metformin.pioglitazone': {'No': 0, 'Down': 1, 'Steady': 2, 'Up': 3}})
score.head(2)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | repaglinide | nateglinide | chlorpropamide | glimepiride | acetohexamide | glipizide | glyburide | tolbutamide | pioglitazone | rosiglitazone | acarbose | miglitol | troglitazone | tolazamide | examide | citoglipton | insulin | glyburide.metformin | glipizide.metformin | glimepiride.pioglitazone | metformin.rosiglitazone | metformin.pioglitazone | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | 4 | 0 | 3 | 11 | 1 | 2 | 7 | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 2 | 1 | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
# save converted data frame with only int to a new file
score_clean_NoString = score
#info
score_clean_NoString.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14000 entries, 0 to 13999 Data columns (total 47 columns): race 14000 non-null int64 gender 14000 non-null int64 age 14000 non-null int64 weight 14000 non-null int64 admission_type_id 14000 non-null int64 discharge_disposition_id 14000 non-null int64 admission_source_id 14000 non-null int64 time_in_hospital 14000 non-null int64 payer_code 14000 non-null int64 MED_SPEC_NUM 14000 non-null int64 num_lab_procedures 14000 non-null int64 num_procedures 14000 non-null int64 num_medications 14000 non-null int64 number_outpatient 14000 non-null int64 number_emergency 14000 non-null int64 number_inpatient 14000 non-null int64 DIAG_CAT_1 14000 non-null int64 DIAG_CAT_2 14000 non-null int64 DIAG_CAT_3 14000 non-null int64 number_diagnoses 14000 non-null int64 max_glu_serum 14000 non-null int64 A1Cresult 14000 non-null int64 metformin 14000 non-null int64 repaglinide 14000 non-null int64 nateglinide 14000 non-null int64 chlorpropamide 14000 non-null int64 glimepiride 14000 non-null int64 acetohexamide 14000 non-null int64 glipizide 14000 non-null int64 glyburide 14000 non-null int64 tolbutamide 14000 non-null int64 pioglitazone 14000 non-null int64 rosiglitazone 14000 non-null int64 acarbose 14000 non-null int64 miglitol 14000 non-null int64 troglitazone 14000 non-null int64 tolazamide 14000 non-null int64 examide 14000 non-null int64 citoglipton 14000 non-null int64 insulin 14000 non-null int64 glyburide.metformin 14000 non-null int64 glipizide.metformin 14000 non-null int64 glimepiride.pioglitazone 14000 non-null int64 metformin.rosiglitazone 14000 non-null int64 metformin.pioglitazone 14000 non-null int64 change 14000 non-null int64 diabetesMed 14000 non-null int64 dtypes: int64(47) memory usage: 5.0 MB
# write dataframe with no string values to new csv file
score_clean_NoString.to_csv('data/Challenge_1_Validation_Work_Clean_NoString.csv')
#score = pd.read_csv('data/Challenge_1_Validation_Work.csv')
Random Forest¶
score.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14000 entries, 0 to 13999 Data columns (total 47 columns): race 14000 non-null int64 gender 14000 non-null int64 age 14000 non-null int64 weight 14000 non-null int64 admission_type_id 14000 non-null int64 discharge_disposition_id 14000 non-null int64 admission_source_id 14000 non-null int64 time_in_hospital 14000 non-null int64 payer_code 14000 non-null int64 MED_SPEC_NUM 14000 non-null int64 num_lab_procedures 14000 non-null int64 num_procedures 14000 non-null int64 num_medications 14000 non-null int64 number_outpatient 14000 non-null int64 number_emergency 14000 non-null int64 number_inpatient 14000 non-null int64 DIAG_CAT_1 14000 non-null int64 DIAG_CAT_2 14000 non-null int64 DIAG_CAT_3 14000 non-null int64 number_diagnoses 14000 non-null int64 max_glu_serum 14000 non-null int64 A1Cresult 14000 non-null int64 metformin 14000 non-null int64 repaglinide 14000 non-null int64 nateglinide 14000 non-null int64 chlorpropamide 14000 non-null int64 glimepiride 14000 non-null int64 acetohexamide 14000 non-null int64 glipizide 14000 non-null int64 glyburide 14000 non-null int64 tolbutamide 14000 non-null int64 pioglitazone 14000 non-null int64 rosiglitazone 14000 non-null int64 acarbose 14000 non-null int64 miglitol 14000 non-null int64 troglitazone 14000 non-null int64 tolazamide 14000 non-null int64 examide 14000 non-null int64 citoglipton 14000 non-null int64 insulin 14000 non-null int64 glyburide.metformin 14000 non-null int64 glipizide.metformin 14000 non-null int64 glimepiride.pioglitazone 14000 non-null int64 metformin.rosiglitazone 14000 non-null int64 metformin.pioglitazone 14000 non-null int64 change 14000 non-null int64 diabetesMed 14000 non-null int64 dtypes: int64(47) memory usage: 5.0 MB
# Set Y and X
y = df['readm2']
X = df.drop(['readm2'], axis=1)
X.head(12)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | glimepiride | glipizide | glyburide | pioglitazone | rosiglitazone | insulin | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0 | 8 | 0 | 3 | 1 | 4 | 5 | 0 | 0 | 39 | 3 | 11 | 0 | 0 | 0 | 10 | 4 | 18 | 7 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 |
| 1 | 3 | 0 | 7 | 0 | 5 | 3 | 1 | 6 | 7 | 0 | 79 | 1 | 25 | 3 | 0 | 0 | 16 | 13 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 3 | 1 | 1 |
| 2 | 5 | 0 | 6 | 0 | 1 | 22 | 7 | 4 | 14 | 18 | 29 | 2 | 18 | 0 | 0 | 1 | 24 | 18 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 |
| 3 | 3 | 1 | 7 | 0 | 1 | 1 | 7 | 3 | 0 | 18 | 72 | 3 | 18 | 0 | 0 | 0 | 17 | 4 | 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 |
| 4 | 3 | 0 | 3 | 0 | 2 | 1 | 1 | 3 | 0 | 0 | 21 | 1 | 6 | 0 | 0 | 0 | 23 | 18 | 32 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 3 | 1 | 7 | 0 | 2 | 1 | 1 | 2 | 0 | 18 | 4 | 0 | 7 | 0 | 0 | 0 | 14 | 9 | 3 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 6 | 14 | 33 | 89 | 0 | 25 | 0 | 2 | 1 | 25 | 10 | 16 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 |
| 7 | 3 | 0 | 7 | 0 | 1 | 6 | 7 | 4 | 6 | 0 | 63 | 0 | 22 | 0 | 2 | 4 | 16 | 3 | 3 | 5 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 8 | 3 | 1 | 6 | 0 | 1 | 1 | 7 | 6 | 7 | 0 | 45 | 0 | 24 | 0 | 0 | 0 | 16 | 9 | 3 | 7 | 0 | 3 | 2 | 0 | 0 | 0 | 2 | 0 | 1 | 1 | 1 |
| 9 | 3 | 1 | 8 | 0 | 1 | 1 | 7 | 2 | 3 | 0 | 45 | 0 | 13 | 0 | 0 | 0 | 17 | 3 | 3 | 9 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 1 | 1 |
| 10 | 3 | 1 | 6 | 0 | 2 | 1 | 1 | 3 | 7 | 0 | 57 | 6 | 21 | 0 | 0 | 0 | 12 | 10 | 4 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 |
| 11 | 3 | 0 | 4 | 0 | 6 | 1 | 17 | 6 | 0 | 18 | 81 | 0 | 26 | 0 | 0 | 0 | 13 | 21 | 21 | 9 | 0 | 3 | 0 | 0 | 2 | 0 | 2 | 0 | 0 | 1 | 1 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 56000 entries, 0 to 55999 Data columns (total 32 columns): race 56000 non-null int64 gender 56000 non-null int64 age 56000 non-null int64 weight 56000 non-null int64 admission_type_id 56000 non-null int64 discharge_disposition_id 56000 non-null int64 admission_source_id 56000 non-null int64 time_in_hospital 56000 non-null int64 payer_code 56000 non-null int64 MED_SPEC_NUM 56000 non-null int64 num_lab_procedures 56000 non-null int64 num_procedures 56000 non-null int64 num_medications 56000 non-null int64 number_outpatient 56000 non-null int64 number_emergency 56000 non-null int64 number_inpatient 56000 non-null int64 DIAG_CAT_1 56000 non-null int64 DIAG_CAT_2 56000 non-null int64 DIAG_CAT_3 56000 non-null int64 number_diagnoses 56000 non-null int64 max_glu_serum 56000 non-null int64 A1Cresult 56000 non-null int64 metformin 56000 non-null int64 glimepiride 56000 non-null int64 glipizide 56000 non-null int64 glyburide 56000 non-null int64 pioglitazone 56000 non-null int64 rosiglitazone 56000 non-null int64 insulin 56000 non-null int64 change 56000 non-null int64 diabetesMed 56000 non-null int64 readm2 56000 non-null int64 dtypes: int64(32) memory usage: 13.7 MB
score.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14000 entries, 0 to 13999 Data columns (total 47 columns): race 14000 non-null int64 gender 14000 non-null int64 age 14000 non-null int64 weight 14000 non-null int64 admission_type_id 14000 non-null int64 discharge_disposition_id 14000 non-null int64 admission_source_id 14000 non-null int64 time_in_hospital 14000 non-null int64 payer_code 14000 non-null int64 MED_SPEC_NUM 14000 non-null int64 num_lab_procedures 14000 non-null int64 num_procedures 14000 non-null int64 num_medications 14000 non-null int64 number_outpatient 14000 non-null int64 number_emergency 14000 non-null int64 number_inpatient 14000 non-null int64 DIAG_CAT_1 14000 non-null int64 DIAG_CAT_2 14000 non-null int64 DIAG_CAT_3 14000 non-null int64 number_diagnoses 14000 non-null int64 max_glu_serum 14000 non-null int64 A1Cresult 14000 non-null int64 metformin 14000 non-null int64 repaglinide 14000 non-null int64 nateglinide 14000 non-null int64 chlorpropamide 14000 non-null int64 glimepiride 14000 non-null int64 acetohexamide 14000 non-null int64 glipizide 14000 non-null int64 glyburide 14000 non-null int64 tolbutamide 14000 non-null int64 pioglitazone 14000 non-null int64 rosiglitazone 14000 non-null int64 acarbose 14000 non-null int64 miglitol 14000 non-null int64 troglitazone 14000 non-null int64 tolazamide 14000 non-null int64 examide 14000 non-null int64 citoglipton 14000 non-null int64 insulin 14000 non-null int64 glyburide.metformin 14000 non-null int64 glipizide.metformin 14000 non-null int64 glimepiride.pioglitazone 14000 non-null int64 metformin.rosiglitazone 14000 non-null int64 metformin.pioglitazone 14000 non-null int64 change 14000 non-null int64 diabetesMed 14000 non-null int64 dtypes: int64(47) memory usage: 5.0 MB
#drop or remove these columns since they are not used in any of the cases
score = score.drop('examide', axis=1)
score = score.drop('citoglipton', axis=1)
score = score.drop('glimepiride.pioglitazone', axis=1)
#drop or remove the column 'ID' since this column is not used in the analysis and disply the result
score = score.drop('acetohexamide', axis=1)
score = score.drop('metformin.pioglitazone', axis=1)
score = score.drop('metformin.rosiglitazone', axis=1)
score = score.drop('tolazamide', axis=1)
score = score.drop('tolbutamide', axis=1)
score = score.drop('troglitazone', axis=1)
score = score.drop('chlorpropamide', axis=1)
score = score.drop('glipizide.metformin', axis=1)
score = score.drop('miglitol', axis=1)
score = score.drop('acarbose', axis=1)
score = score.drop('glyburide.metformin', axis=1)
score = score.drop('nateglinide', axis=1)
score = score.drop('repaglinide', axis=1)
score.head(12)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | glimepiride | glipizide | glyburide | pioglitazone | rosiglitazone | insulin | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | 4 | 0 | 3 | 11 | 1 | 2 | 7 | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 2 | 1 | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 2 | 3 | 1 | 1 |
| 2 | 3 | 1 | 6 | 0 | 6 | 7 | 7 | 1 | 0 | 18 | 42 | 0 | 12 | 0 | 0 | 0 | 23 | 32 | 3 | 3 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 |
| 3 | 3 | 0 | 7 | 0 | 1 | 1 | 7 | 5 | 0 | 0 | 52 | 2 | 25 | 1 | 1 | 0 | 24 | 12 | 13 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 |
| 4 | 3 | 0 | 4 | 0 | 2 | 1 | 7 | 2 | 14 | 55 | 41 | 2 | 3 | 0 | 0 | 0 | 15 | 17 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 3 | 1 | 4 | 0 | 5 | 1 | 1 | 5 | 6 | 0 | 1 | 2 | 25 | 1 | 0 | 0 | 32 | 9 | 18 | 5 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 2 | 2 | 1 | 1 |
| 6 | 1 | 0 | 8 | 0 | 1 | 3 | 7 | 1 | 7 | 0 | 58 | 1 | 11 | 1 | 0 | 1 | 23 | 13 | 23 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 |
| 7 | 3 | 1 | 5 | 0 | 1 | 6 | 7 | 5 | 7 | 0 | 54 | 0 | 13 | 0 | 0 | 0 | 6 | 17 | 26 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 |
| 8 | 3 | 0 | 8 | 0 | 2 | 3 | 1 | 3 | 0 | 0 | 48 | 0 | 10 | 0 | 0 | 0 | 17 | 3 | 18 | 7 | 0 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 1 | 1 |
| 9 | 3 | 0 | 8 | 0 | 1 | 3 | 7 | 4 | 1 | 0 | 41 | 0 | 14 | 0 | 0 | 1 | 24 | 18 | 3 | 9 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
| 10 | 3 | 1 | 2 | 0 | 2 | 1 | 2 | 10 | 0 | 0 | 53 | 0 | 20 | 0 | 0 | 0 | 3 | 3 | 3 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| 11 | 3 | 1 | 4 | 0 | 2 | 6 | 4 | 6 | 0 | 4 | 48 | 2 | 11 | 0 | 0 | 0 | 10 | 10 | 10 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 |
score.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14000 entries, 0 to 13999 Data columns (total 31 columns): race 14000 non-null int64 gender 14000 non-null int64 age 14000 non-null int64 weight 14000 non-null int64 admission_type_id 14000 non-null int64 discharge_disposition_id 14000 non-null int64 admission_source_id 14000 non-null int64 time_in_hospital 14000 non-null int64 payer_code 14000 non-null int64 MED_SPEC_NUM 14000 non-null int64 num_lab_procedures 14000 non-null int64 num_procedures 14000 non-null int64 num_medications 14000 non-null int64 number_outpatient 14000 non-null int64 number_emergency 14000 non-null int64 number_inpatient 14000 non-null int64 DIAG_CAT_1 14000 non-null int64 DIAG_CAT_2 14000 non-null int64 DIAG_CAT_3 14000 non-null int64 number_diagnoses 14000 non-null int64 max_glu_serum 14000 non-null int64 A1Cresult 14000 non-null int64 metformin 14000 non-null int64 glimepiride 14000 non-null int64 glipizide 14000 non-null int64 glyburide 14000 non-null int64 pioglitazone 14000 non-null int64 rosiglitazone 14000 non-null int64 insulin 14000 non-null int64 change 14000 non-null int64 diabetesMed 14000 non-null int64 dtypes: int64(31) memory usage: 3.3 MB
# develop a random forest model
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=20) #building 20 decision trees
clf=clf.fit(X, y)
clf.score(X,y)
0.99219642857142853
Random Forest = 99.23%¶
# generate evaluation metrics
print(metrics.accuracy_score(y, clf.predict(X))) #overall accuracy
print(metrics.confusion_matrix(y, clf.predict(X)))
print(metrics.classification_report(y, clf.predict(X)))
0.992196428571
[[49715 0]
[ 437 5848]]
precision recall f1-score support
0 0.99 1.00 1.00 49715
1 1.00 0.93 0.96 6285
avg / total 0.99 0.99 0.99 56000
print("Features sorted by their rank:")
print(sorted(zip(map(lambda x: round(x, 4), clf.feature_importances_), X.columns)))
Features sorted by their rank: [(0.0058999999999999999, 'weight'), (0.0071000000000000004, 'max_glu_serum'), (0.0077000000000000002, 'rosiglitazone'), (0.0080999999999999996, 'glimepiride'), (0.0091999999999999998, 'diabetesMed'), (0.0094999999999999998, 'pioglitazone'), (0.0109, 'glyburide'), (0.011900000000000001, 'metformin'), (0.0134, 'glipizide'), (0.0143, 'change'), (0.017000000000000001, 'A1Cresult'), (0.017000000000000001, 'number_emergency'), (0.019, 'gender'), (0.021700000000000001, 'number_outpatient'), (0.023400000000000001, 'race'), (0.023599999999999999, 'admission_source_id'), (0.027099999999999999, 'admission_type_id'), (0.028400000000000002, 'insulin'), (0.035000000000000003, 'number_diagnoses'), (0.0361, 'num_procedures'), (0.037600000000000001, 'discharge_disposition_id'), (0.038600000000000002, 'payer_code'), (0.044400000000000002, 'number_inpatient'), (0.044699999999999997, 'MED_SPEC_NUM'), (0.048599999999999997, 'age'), (0.0591, 'time_in_hospital'), (0.065199999999999994, 'DIAG_CAT_3'), (0.0659, 'DIAG_CAT_1'), (0.070900000000000005, 'DIAG_CAT_2'), (0.081199999999999994, 'num_medications'), (0.097600000000000006, 'num_lab_procedures')]
# another method
pd.DataFrame({'feature':X.columns, 'importance':clf.feature_importances_})
| feature | importance | |
|---|---|---|
| 0 | race | 0.023426 |
| 1 | gender | 0.018953 |
| 2 | age | 0.048555 |
| 3 | weight | 0.005906 |
| 4 | admission_type_id | 0.027056 |
| 5 | discharge_disposition_id | 0.037582 |
| 6 | admission_source_id | 0.023568 |
| 7 | time_in_hospital | 0.059076 |
| 8 | payer_code | 0.038612 |
| 9 | MED_SPEC_NUM | 0.044668 |
| 10 | num_lab_procedures | 0.097644 |
| 11 | num_procedures | 0.036111 |
| 12 | num_medications | 0.081217 |
| 13 | number_outpatient | 0.021676 |
| 14 | number_emergency | 0.017004 |
| 15 | number_inpatient | 0.044374 |
| 16 | DIAG_CAT_1 | 0.065932 |
| 17 | DIAG_CAT_2 | 0.070878 |
| 18 | DIAG_CAT_3 | 0.065177 |
| 19 | number_diagnoses | 0.035034 |
| 20 | max_glu_serum | 0.007139 |
| 21 | A1Cresult | 0.017001 |
| 22 | metformin | 0.011851 |
| 23 | glimepiride | 0.008142 |
| 24 | glipizide | 0.013425 |
| 25 | glyburide | 0.010854 |
| 26 | pioglitazone | 0.009453 |
| 27 | rosiglitazone | 0.007738 |
| 28 | insulin | 0.028408 |
| 29 | change | 0.014314 |
| 30 | diabetesMed | 0.009225 |
#Predict class probabilities for X
clf.predict_proba(X)
array([[ 0.95, 0.05],
[ 1. , 0. ],
[ 0.95, 0.05],
...,
[ 0.95, 0.05],
[ 0.95, 0.05],
[ 0.8 , 0.2 ]])
Make predictions on the new dataset (scoring dataset without y value)¶
score.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14000 entries, 0 to 13999 Data columns (total 31 columns): race 14000 non-null int64 gender 14000 non-null int64 age 14000 non-null int64 weight 14000 non-null int64 admission_type_id 14000 non-null int64 discharge_disposition_id 14000 non-null int64 admission_source_id 14000 non-null int64 time_in_hospital 14000 non-null int64 payer_code 14000 non-null int64 MED_SPEC_NUM 14000 non-null int64 num_lab_procedures 14000 non-null int64 num_procedures 14000 non-null int64 num_medications 14000 non-null int64 number_outpatient 14000 non-null int64 number_emergency 14000 non-null int64 number_inpatient 14000 non-null int64 DIAG_CAT_1 14000 non-null int64 DIAG_CAT_2 14000 non-null int64 DIAG_CAT_3 14000 non-null int64 number_diagnoses 14000 non-null int64 max_glu_serum 14000 non-null int64 A1Cresult 14000 non-null int64 metformin 14000 non-null int64 glimepiride 14000 non-null int64 glipizide 14000 non-null int64 glyburide 14000 non-null int64 pioglitazone 14000 non-null int64 rosiglitazone 14000 non-null int64 insulin 14000 non-null int64 change 14000 non-null int64 diabetesMed 14000 non-null int64 dtypes: int64(31) memory usage: 3.3 MB
score.head()
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | glimepiride | glipizide | glyburide | pioglitazone | rosiglitazone | insulin | change | diabetesMed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | 4 | 0 | 3 | 11 | 1 | 2 | 7 | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 2 | 1 | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 2 | 3 | 1 | 1 |
| 2 | 3 | 1 | 6 | 0 | 6 | 7 | 7 | 1 | 0 | 18 | 42 | 0 | 12 | 0 | 0 | 0 | 23 | 32 | 3 | 3 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 |
| 3 | 3 | 0 | 7 | 0 | 1 | 1 | 7 | 5 | 0 | 0 | 52 | 2 | 25 | 1 | 1 | 0 | 24 | 12 | 13 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 |
| 4 | 3 | 0 | 4 | 0 | 2 | 1 | 7 | 2 | 14 | 55 | 41 | 2 | 3 | 0 | 0 | 0 | 15 | 17 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
#score=pd.read_csv("data/Challenge_1_Validation_Work_Clean_NoString.csv")
output_scoring = clf.predict(score)
predicted_y= pd.DataFrame(output_scoring, columns=['Predicted_Readmit_30'])
probs = clf.predict_proba(score)
probs = pd.DataFrame(probs, columns=['Prob of NO', 'Prob of YES'])
readmit_patients = predicted_y.join(probs)
readmit_patients.to_csv("data/output_readmit_RandomForest_ScoringDataset.csv")
readmit_patients.head()
| Predicted_Readmit_30 | Prob of NO | Prob of YES | |
|---|---|---|---|
| 0 | 0 | 1.00 | 0.00 |
| 1 | 0 | 0.95 | 0.05 |
| 2 | 0 | 1.00 | 0.00 |
| 3 | 0 | 0.95 | 0.05 |
| 4 | 0 | 0.70 | 0.30 |
#finally ...
data1 = score.join(readmit_patients)
data1.head(10)
| race | gender | age | weight | admission_type_id | discharge_disposition_id | admission_source_id | time_in_hospital | payer_code | MED_SPEC_NUM | num_lab_procedures | num_procedures | num_medications | number_outpatient | number_emergency | number_inpatient | DIAG_CAT_1 | DIAG_CAT_2 | DIAG_CAT_3 | number_diagnoses | max_glu_serum | A1Cresult | metformin | glimepiride | glipizide | glyburide | pioglitazone | rosiglitazone | insulin | change | diabetesMed | Predicted_Readmit_30 | Prob of NO | Prob of YES | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1 | 4 | 0 | 3 | 11 | 1 | 2 | 7 | 19 | 48 | 4 | 11 | 0 | 0 | 0 | 16 | 14 | 12 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 | 0.00 |
| 1 | 3 | 0 | 4 | 0 | 1 | 1 | 7 | 2 | 1 | 0 | 31 | 1 | 28 | 0 | 0 | 0 | 27 | 3 | 17 | 5 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 2 | 3 | 1 | 1 | 0 | 0.95 | 0.05 |
| 2 | 3 | 1 | 6 | 0 | 6 | 7 | 7 | 1 | 0 | 18 | 42 | 0 | 12 | 0 | 0 | 0 | 23 | 32 | 3 | 3 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 0 | 1.00 | 0.00 |
| 3 | 3 | 0 | 7 | 0 | 1 | 1 | 7 | 5 | 0 | 0 | 52 | 2 | 25 | 1 | 1 | 0 | 24 | 12 | 13 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 0 | 0.95 | 0.05 |
| 4 | 3 | 0 | 4 | 0 | 2 | 1 | 7 | 2 | 14 | 55 | 41 | 2 | 3 | 0 | 0 | 0 | 15 | 17 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.70 | 0.30 |
| 5 | 3 | 1 | 4 | 0 | 5 | 1 | 1 | 5 | 6 | 0 | 1 | 2 | 25 | 1 | 0 | 0 | 32 | 9 | 18 | 5 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 2 | 2 | 1 | 1 | 0 | 0.85 | 0.15 |
| 6 | 1 | 0 | 8 | 0 | 1 | 3 | 7 | 1 | 7 | 0 | 58 | 1 | 11 | 1 | 0 | 1 | 23 | 13 | 23 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 0.95 | 0.05 |
| 7 | 3 | 1 | 5 | 0 | 1 | 6 | 7 | 5 | 7 | 0 | 54 | 0 | 13 | 0 | 0 | 0 | 6 | 17 | 26 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 0.95 | 0.05 |
| 8 | 3 | 0 | 8 | 0 | 2 | 3 | 1 | 3 | 0 | 0 | 48 | 0 | 10 | 0 | 0 | 0 | 17 | 3 | 18 | 7 | 0 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 1 | 1 | 0 | 0.85 | 0.15 |
| 9 | 3 | 0 | 8 | 0 | 1 | 3 | 7 | 4 | 1 | 0 | 41 | 0 | 14 | 0 | 0 | 1 | 24 | 18 | 3 | 9 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0.95 | 0.05 |